Installation¶
First, clone the repository:
WarpSTR can then be easily installed using the conda environment, frozen in conda_req.yaml
. The conda environment can be created as follows:
After installation, it is required to activate conda environment:
WarpSTR was tested in Ubuntu 20.04 OS.
If you have any problems with installation, please raise an issue on Github and we will look further into it as soon as possible.
Test case¶
In test/test_input
there is a small test dataset. There are 10 reads for one locus. In the template config test/config_template.yaml
you can see that inputs are defined as follows:
In the test data, there is only one run, called test_run1
with all sequencing files stored there (i.e. BAM files and .fast5 files):
test_input/
└── test_run1
├── fast5s
│ └── batch_0.fast5
└── mapping
├── mapping.bam
└── mapping.bam.bai
Note
In WarpSTR, we allow for multiple sequencing runs, which are defined by runs
element, where other runs are defined by comma, i.e. test_run1,test_run2
and so on. WarpSTR concatenates path
value with each comma splitted value of runs
to obtain a list of paths, that are then search recursively for input files (.BAM and .fast5 files).
To run test data, you need to define reference_path
and guppy_config.path
in the config, or you can use provided wrapper bash script run_test_case.sh
. Simply, when in WarpSTR directory (and with activated conda environment), run the wrapper script:
When running this wrapper script, the script will prompt you to provide the required paths and run the WarpSTR for you. Output files will be then stored in test/test_output/
as given in the config file.
Warning
Test data were produced using human genome reference denoted as GRCh38
, so ensure that you provide the path to the refernce of this version. You can download the reference for example from https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/