SeqAn3
The Modern C++ library for sequence analysis.
First steps with SeqAn3

Learning Objective:

This tutorial walks you through small SeqAn3 programs. It is intended to give you a short overview of what to expect in the other tutorials and how to use this documentation.

DifficultyEasy
Duration30 min
Prerequisite tutorialsQuick Setup (using CMake)
Recommended readingRanges, Concepts

Every page in the tutorials begins with this section. It is recommended that you do the "prerequisite tutorials" before the current one. You should also have a look at the links provided in "recommended reading" and maybe keep them open in separate tabs/windows as reference.

These tutorials try to briefly introduce C++ features not well known, however they do not teach programming in C++! If you know how to program in another language, but are not familiar with C++ and/or the significant changes in the language in recent years, we recommend the following resources:

  • Bjarne Stroustrup: "A Tour of C++", Second Edition, 2018.

Hello World!

Most good tutorials start with an easy Hello World! program. So have a look:

#include <seqan3/core/debug_stream.hpp> // for debug_stream
using namespace seqan3;
int main()
{
debug_stream << "Hello World!\n";
}
Note
This is a code snippet. You will see many code snippets in our documentation. Most of them are compilable as-is, but some are only valid in their context, e.g. they depend on other code snippets given before/after the current one or other statements implied by the text. You can copy'n'paste freely from these examples, this implies no copyright-obligations (however distributing SeqAn3 or an application using it does, see Copyright and Citing.

You may ask, why we do not use std::cout or std::cerr for console output. Actually, for the given text it does not make a difference since seqan3::debug_stream prints to std::cerr as well. However, the debug stream provides convenient output for SeqAn's types as well as widely used data structures (e.g. std::vector), which is especially helpful when you debug or develop your program (that's where the name originates).

Exercise: Debug stream

Write a program that creates a std::vector of type int and initialise the vector with a few values. Then print the vector with seqan3::debug_stream. Does your program also work with std::cerr?
Solution

#include <iostream> // for std::cerr, std::endl
#include <vector> // for std::vector
#include <seqan3/core/debug_stream.hpp> // for debug_stream
using namespace seqan3;
int main()
{
std::vector<int> vec{-1,0,1};
debug_stream << vec << std::endl; // => [-1,0,1]
// std::cerr << vec << std::endl; // compiler error: no operator<< for std::vector<int>
}

Note
This is an exercise with solution. You will find exercises in the tutorials to practise the discussed contents. We believe that programming them will help you to memorise better and makes the tutorials more interesting and interactive. The solutions provide the intended use; but often there are multiple ways to solve an exercise, so don't worry too much if your solution is different from ours.

Parse command line arguments

After we have seen the Hello World! program, we want to go a bit further and parse arguments from the command line. The following snippet shows you how this is done in SeqAn3. Here the program expects a string argument in the program call and prints it to your terminal.

#include <string> // for std::string
#include <seqan3/argument_parser/all.hpp> // for argument_parser
#include <seqan3/core/debug_stream.hpp> // for debug_stream
using namespace seqan3;
int main(int argc, char * argv[])
{
// Create a buffer for the input.
std::string input{};
// Initialise the Argument Parser and add an option.
seqan3::argument_parser parser("My first program", argc, argv);
parser.add_positional_option(input, "Give me text.");
// Parse the given arguments and catch possible errors.
try
{
parser.parse();
}
catch (seqan3::parser_invalid_argument const & ext)
{
debug_stream << "[PARSER ERROR] " << ext.what() << '\n';
return 0;
}
debug_stream << "The text was: " << input << "\n";
}

Implementing a program with seqan3::argument_parser requires three steps:

  1. Initialise the seqan3::argument_parser with your program's name and pass the argc and argv variables.
  2. Register (positional) options in the parser object. In this way it knows which options to expect and it can generate the help page for your program. You will learn more about the option types in the Argument Parser Tutorial.
  3. Run the parser. As it throws exceptions on wrong user behaviour, it should be surrounded with a try-catch block.

You will see that the entered text is now in the buffer variable input. The argument parser provides way more functionality than we can show at this point, e.g. validation of arguments and different option types. We refer you to the respective tutorial if you want to know more.

Note
You may have spotted that the blue coloured keywords link you directly to the respective API documentation. This is helpful if you need further information on a function, concept or class. We recommend you to open them in separate browser tabs such that you can easily switch back to the tutorial.

Modules in SeqAn3

You have just been introduced to one of the Modules of SeqAn3, the Argument Parser. Modules structure the SeqAn3 library into logical units, as there are for instance alignment, alphabet, argument_parser, io, search and some more. See the API Reference (Modules) section in the navigation column for a complete overview.

Some modules consist of submodules and the module structure is represented by the file hierarchy in the include directory. Whenever you use functions of a module, make sure to include the correct header file. Each directory in the SeqAn sources contains an all.hpp file which includes all the functionality of the respective (sub-) module. For small examples and quick prototyping, you can just include these all.hpp-headers. However, for larger projects we recommend you include only the necessary headers, because this will reduce the compile time measurably.

Note
If you remember the name of a function or class, but don't know which (sub-)module it belongs to, you can enter it in the search bar (top-right).

Read sequence files

Let's look at some functions of the IO module: SeqAn3 provides fast and easy access to biological file formats. The following code example demonstrates the interface of seqan3::sequence_file_input.

#include <seqan3/io/sequence_file/input.hpp> // for sequence_file_input
#include <seqan3/core/debug_stream.hpp> // for debug_stream
// Initialise a file input object with a FastA file.
sequence_file_input file_in{filename}; // filename: "seq.fasta"
// Retrieve the sequences and ids.
for (auto &[seq, id, qual] : file_in)
{
debug_stream << "ID: " << id << '\n';
debug_stream << "SEQ: " << seq << '\n';
debug_stream << "EMPTY QUAL." << qual << '\n'; // qual is empty for FastA files
}

Can you imagine anything easier? After you have initialised the instance with a filename, you can simply step through the file in a for loop and retrieve the fields via structured bindings. The returned fields are SEQ, ID and QUAL to retrieve sequences, ids and qualities, respectively. The latter is empty unless you read FastQ files. The appropriate file format is detected by SeqAn from your filename's suffix.

Here is the content of seq.fasta, so you can try it out!

>seq1
ACGTGATG
>seq2
AGTGATACT

Exercise: Read a FastA file

Combine the code from above to read a FastA file and store its sequences in a std::vector of type seqan3::dna5_vector (which is a common DNA sequence type in SeqAn3). Use the argument parser for obtaining the filename as command line argument to your program (e.g. call ./myprogram seq.fasta).
Solution

#include <string> // for std::string
#include <vector> // for std::vector
#include <seqan3/argument_parser/all.hpp> // for argument_parser
#include <seqan3/io/sequence_file/input.hpp> // for sequence_file_input
#include <seqan3/core/debug_stream.hpp> // for debug_stream
using namespace seqan3;
int main(int argc, char * argv[])
{
// Receive the filename as program argument.
std::string filename{};
seqan3::argument_parser parser("My first program", argc, argv);
parser.add_positional_option(filename, "The filename of the file to read.");
try
{
parser.parse();
}
catch (seqan3::parser_invalid_argument const & ext)
{
debug_stream << "[PARSER ERROR] " << ext.what() << '\n';
return 0;
}
debug_stream << "Reading file " << filename << "\n";
// Create the vector to store sequences of type seqan3::dna5_vector.
// Iterate through the file and store the sequences.
sequence_file_input file_in{filename};
for (auto & [ seq, id, qual ] : file_in)
{
sequences.push_back(seq);
}
}

Note that the same code can also read FastQ files and the qual variable will not be empty then. If you like, try it!

Note
SeqAn3 uses snake_case for almost everything, also class names. Only C++ concepts are named using CamelCase.

Align two sequences

We have two sequences from the file above now – so let us align them. The pairwise sequence alignment is one of the core algorithms in SeqAn and used by several library components and apps. It is strongly optimised for speed and parallel execution while providing exact results and a generic interface.

#include <tuple> // for std::make_pair
#include <seqan3/alignment/aligned_sequence/aligned_sequence_concept.hpp> // for alignment stream operator
#include <seqan3/alignment/pairwise/align_pairwise.hpp> // for align_pairwise
// Call a pairwise alignment with edit distance and traceback.
for (auto && res : align_pairwise(std::tie(sequences[0], sequences[1]),
align_cfg::edit | align_cfg::result{with_alignment}))
{
// Print the resulting score and the alignment.
debug_stream << res.score() << '\n'; // => -4
debug_stream << res.alignment() << '\n'; // => 0 . :
// ACGTGATG--
// | |||||
// A-GTGATACT
}

The algorithm returns a range of result objects – which is the reason for the loop here (in this case the range has length 1). Instead of passing a single pair of sequences, we could give a vector of sequence pairs to the algorithm which then executes all alignments in parallel and stores the results in various seqan3::alignment_result objects. The second argument to seqan3::align_pairwise is the configuration which allows you to specify a lot of parameters for the alignment computation, for instance score functions, banded alignment and whether you wish to compute a traceback or not. The configurations have their own namespace seqan3::align_cfg and can be combined via the logical OR operator (|) for building combinations. Check out the alignment tutorial if you want to learn more.

Note
We use a lot of Modern C++ in SeqAn3 so some things might look alien at first, e.g. type templates are used like ordinary types in many situations (no <>). We also always use {} to initialise objects and not () which is only used for function calls. In general the style should be much easier for newcomers.

Now that you reached the end of this first tutorial, you know how SeqAn3 code looks like and you are able to write some first code fragments. Let's go more into detail with the module-based tutorials!