SeqAn3 3.4.0-rc.1
The Modern C++ library for sequence analysis.
|
Learning Objective:
In this tutorial, you will learn how to combine the components of previous tutorials to create your very first SeqAn application: a read mapper!
Difficulty | High |
---|---|
Duration | 90 Minutes |
Prerequisite tutorials | All |
Recommended reading |
Read mapping is a common task in bioinformatics and is often the first step of an in-depth analysis of Next Generation Sequencing data. Its aim is to identify positions where a query sequence (read) matches with up to e
errors to a reference sequence.
In this example, we will implement a read mapper step by step and make use of what we have learned in the previous tutorials. As it is common practice with read mappers, we will first create an indexer that creates an index from the reference and stores it to disk. After this, we will implement the actual read mapper that will use the stored index and map the reads.
We provide an example reference and an example query file.
As a first step, we want to parse command line arguments for our indexer. If you get into trouble, you can take a peek at the Argument Parser Tutorial or the API documentation of the seqan3::argument_parser for help.
Follow the best practice and create:
run_program
that prints the parsed argumentscmd_arguments
that stores the argumentsinitialise_argument_parser
to add meta data and options to the parsermain
function that parses the arguments and calls run_program
Use validators where applicable!
Your main
may look like this:
As a next step, we want to use the parsed file name to read in our reference data. This will be done using seqan3::sequence_file_input class. As a guide, you can take a look at the Sequence I/O Tutorial.
To do this, you should create:
reference_storage_t
that stores the sequence information for both reference and query information within member variablesread_reference
that fills a reference_storage_t
object with information from the files and prints the reference IDsYou should also perform the following changes in run_program
:
storage
of type reference_storage_t
read_reference
This is the signature of read_reference
:
This is the reference_storage_t
:
Here is the complete program:
Now that we have the necessary sequence information, we can create an index and store it. Read up on the Index Tutorial if you have any questions.
create_index
:index_path
and storage
as parametersWe also need to change:
run_program
to now also call create_index
run_program
and read_reference
to not print the debug output anymoreThis is the signature of create_index
:
Here is the complete program:
Again, we want to parse command line arguments for our read mapper as a first step. If you get into trouble, you can take a peek at the Argumet Parser Tutorial or the API documentation of the seqan3::argument_parser for help.
Follow the best practice and create:
run_program
that prints the parsed argumentscmd_arguments
that stores the argumentsinitialise_argument_parser
to add meta data and options to the parsermain
function that parses the arguments and calls run_program
Use validators where applicable!
Your main
may look like this:
We also want to read the reference in the read mapper. This is done the same way as for the indexer. We can now load the index and conduct a search. Read up on the Search Tutorial if you have any questions.
To do this, you should:
read_reference
function and the reference_storage_t
struct from the indexermap_reads
that loads the index and prints the results of the search (allowing all error types) for the first 20 queriesYou should also perform the following changes in run_program
:
storage
of type reference_storage_t
read_reference
and map_reads
This is the signature of map_reads
:
Here is the complete program:
We can now use the obtained positions to align each query against the reference. Refer to the Alignment Tutorial if you have any questions.
map_reads
to:This is the alignment config:
Here is the complete program:
Finally, we can write our results into a SAM file.
map_reads
to write the alignment results into a SAM file. Additionally, there should be no more debug output.
Try to write all available information into the SAM file. We can introduce a naive mapping quality by using mapping quality = 60 + alignment score
.
This is the sam_file_output construction:
Here is the complete program: