SeqAn3 3.1.0-rc.2
The Modern C++ library for sequence analysis.
The SeqAn Cookbook

This document provides example recipes on how to carry out particular tasks using the SeqAn functionalities in C++. Please note that these recipes are not ordered. You can use the links in the table of contents or the search function of your browser to navigate them.

It will take some time, but we hope to expand this document into containing numerous great examples. If you have suggestions for how to improve the Cookbook and/or examples you would like included, please feel free to contact us.

Read sequence files

#include <string>
#include <seqan3/core/debug_stream.hpp> // for debug_stream
#include <seqan3/io/sequence_file/input.hpp> // for sequence_file_input
int main ()
{
std::filesystem::path tmp_dir = std::filesystem::temp_directory_path(); // get the tmp directory
// Initialise a file input object with a FastA file.
seqan3::sequence_file_input file_in{tmp_dir/"seq.fasta"};
// Retrieve the sequences and ids.
for (auto & [seq, id, qual] : file_in)
{
seqan3::debug_stream << "ID: " << id << '\n';
seqan3::debug_stream << "SEQ: " << seq << '\n';
seqan3::debug_stream << "Empty Qual." << qual << '\n'; // qual is empty for FastAfiles
}
return 0;
}
A class for reading sequence files, e.g. FASTA, FASTQ ...
Definition: input.hpp:308
Provides seqan3::debug_stream and related types.
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition: debug_stream.hpp:37
@ seq
The "sequence", usually a range of nucleotides or amino acids.
@ qual
The qualities, usually in Phred score notation.
Provides seqan3::sequence_file_input and corresponding traits classes.
T temp_directory_path(T... args)

Write a custom validator

This recipe implements a validator that checks if a numeric argument is an integral square (i.e. 0, 1, 4, 9...). Invalid values throw a seqan3::validation_error.

#include <cmath>
struct custom_validator
{
using option_value_type = double; // used for all arithmetic types
void operator() (option_value_type const & val) const
{
if ((std::round(val) != val) || // not an integer
(std::pow(std::round(std::sqrt(val)), 2) != val)) // not a square
{
throw seqan3::validation_error{"The provided number is not an arithmetic square."};
}
}
std::string get_help_page_message () const
{
return "Value must be the square of an integral number.";
}
};
Argument parser exception thrown when an argument could not be casted to the according type.
Definition: exceptions.hpp:115
T pow(T... args)
T round(T... args)
T sqrt(T... args)

Construction and assignment of alphabet symbols

#include <seqan3/alphabet/all.hpp> // for working with alphabets directly
int main ()
{
using namespace seqan3::literals;
// Two objects of seqan3::dna4 alphabet constructed with a char literal.
seqan3::dna4 ade = 'A'_dna4;
seqan3::dna4 gua = 'G'_dna4;
// Two additional objects assigned explicitly from char or rank.
seqan3::dna4 cyt, thy;
cyt.assign_char('C');
thy.assign_rank(3);
// Further code here...
Meta-header for the alphabet module.
constexpr derived_type & assign_char(char_type const chr) noexcept
Assign from a character, implicitly converts invalid characters.
Definition: alphabet_base.hpp:165
constexpr derived_type & assign_rank(rank_type const c) noexcept
Assign from a numeric value.
Definition: alphabet_base.hpp:191
The four letter DNA alphabet of A,C,G,T..
Definition: dna4.hpp:53
The SeqAn namespace for literals.
return 0;
}
// Get the rank type of the alphabet (here uint8_t).
// Retrieve the numerical representation (rank) of the objects.
rank_type rank_a = ade.to_rank(); // => 0
rank_type rank_g = gua.to_rank(); // => 2
constexpr rank_type to_rank() const noexcept
Return the letter's numeric value (rank in the alphabet).
Definition: alphabet_base.hpp:139
decltype(seqan3::to_rank(std::declval< semi_alphabet_type >())) alphabet_rank_t
The rank_type of the semi-alphabet; defined as the return type of seqan3::to_rank.
Definition: concept.hpp:164

Reverse complement and the six-frame translation of a string using views

This recipe creates a small program that

  1. reads a string from the command line (first argument to the program)
  2. "converts" the string to a range of seqan3::dna5 (Bonus: throws an exception if loss of information occurs)
  3. prints the string and its reverse complement
  4. prints the six-frame translation of the string
#include <seqan3/std/ranges> // include all of the standard library's views
#include <seqan3/alphabet/views/all.hpp> // include all of SeqAn's views
#include <seqan3/argument_parser/all.hpp> // optional: include the argument_parser
int main(int argc, char** argv)
{
// We use the seqan3::argument_parser which was introduced in the second chapter
// of the tutorial: "Parsing command line arguments with SeqAn".
seqan3::argument_parser myparser{"Assignment-3", argc, argv}; // initialize
myparser.add_positional_option(s, "Please specify the DNA string.");
try
{
myparser.parse();
}
catch (seqan3::argument_parser_error const & ext) // the user did something wrong
{
std::cerr << "[PARSER ERROR]" << ext.what() << '\n'; // you can customize your error message
return 0;
}
auto s_as_dna = s | seqan3::views::char_to<seqan3::dna5>;
// Bonus:
//auto s_as_dna = s | std::views::transform([] (char const c)
//{
// return seqan3::assign_char_strictly_to(c, seqan3::dna5{});
//});
seqan3::debug_stream << "Original: " << s_as_dna << '\n';
seqan3::debug_stream << "RevComp: " << (s_as_dna | std::views::reverse | seqan3::views::complement) << '\n';
seqan3::debug_stream << "Frames: " << (s_as_dna | seqan3::views::translate) << '\n';
}
Meta-header for the Alphabet / Views submodule .
Meta-header for the Argument Parser module .
Argument parser exception that is thrown whenever there is an error while parsing the command line ar...
Definition: exceptions.hpp:38
The SeqAn command line parser.
Definition: argument_parser.hpp:154
constexpr auto translate
A view that translates nucleotide into aminoacid alphabet with 1, 2, 3 or 6 frames.
Definition: translate.hpp:813
auto const complement
A view that converts a range of nucleotides to their complement.
Definition: complement.hpp:67
Adaptations of concepts from the Ranges TS.
T what(T... args)

Reading records

After construction, you can now read the sequence records. Our file object behaves like a range, you can use a range-based for loop to conveniently iterate over the file:

#include <sstream>
auto input = R"(> TEST1
ACGT
> Test2
AGGCTGA
> Test3
GGAGTATAATATATATATATATAT)";
int main()
{
for (auto & record : fin)
{
seqan3::debug_stream << "ID: " << record.id() << '\n';
seqan3::debug_stream << "SEQ: " << record.sequence() << '\n';
// a quality field also exists, but is not printed, because we know it's empty for FastA files.
}
}
The FastA format.
Definition: format_fasta.hpp:79
Attention
An input file is a single input range, which means you can only iterate over it once!
Note
It is important to write auto & and not just auto, otherwise you will copy the record on every iteration.

You can also use structured binding, i.e. for (auto & [seq, id, qual] : fin) But beware: with structured bindings you do need to get the order of elements correct!

You can also read a file in chunks:

Reading records in chunks

int main()
{
// `&&` is important because seqan3::views::chunk returns temporaries!
for (auto && records : fin | seqan3::views::chunk(10))
{
// `records` contains 10 elements (or less at the end)
seqan3::debug_stream << "Taking the next 10 sequences:\n";
seqan3::debug_stream << "ID: " << (*records.begin()).id() << '\n'; // prints first ID in batch
}
Provides seqan3::views::chunk.
T current_path(T... args)
constexpr auto chunk
A chunk view.
Definition: chunk.hpp:29
Meta-header for the IO / Sequence File submodule .

The example above will iterate over the file by reading 10 records at a time. If no 10 records are available anymore, it will just print the remaining records.

Applying a filter to a file

On some occasions you are only interested in sequence records that fulfill a certain criterion, e.g. having a minimum sequence length or a minimum average quality.

This recipe can be used to filter the sequences in your file by a custom criterion.

#include <numeric> // std::accumulate
int main()
{
// std::views::filter takes a function object (a lambda in this case) as input that returns a boolean
auto minimum_quality_filter = std::views::filter([] (auto const & rec)
{
auto qualities = rec.base_qualities()
| std::views::transform([] (auto quality) { return seqan3::to_phred(quality); });
auto sum = std::accumulate(qualities.begin(), qualities.end(), 0);
return sum / std::ranges::size(qualities) >= 40; // minimum average quality >= 40
});
for (auto & rec : fin | minimum_quality_filter)
{
seqan3::debug_stream << "ID: " << rec.id() << '\n';
}
}
T accumulate(T... args)
constexpr auto to_phred
The public getter function for the Phred representation of a quality score.
Definition: concept.hpp:100
decltype(detail::transform< trait_t >(list_t{})) transform
Apply a transformation trait to every type in the list and return a seqan3::type_list of the results.
Definition: traits.hpp:471
constexpr size_t size
The size of a type pack.
Definition: traits.hpp:151

Reading paired-end reads

In modern Next Generation Sequencing experiments you often have paired-end read data which is split into two files. The read pairs are identified by their identical name/id and position in the two files.

This recipe can be used to handle one pair of reads at a time.

int main()
{
// for simplicity we take the same file
for (auto && [rec1, rec2] : seqan3::views::zip(fin1, fin2)) // && is important!
{ // because seqan3::views::zip returns temporaries
if (rec1.id() != rec2.id())
throw std::runtime_error("Your pairs don't match.");
}
}
constexpr auto zip
A zip view.
Definition: zip.hpp:29
Provides seqan3::views::zip.

Storing records in a std::vector

This recipe creates a small program that reads in a FASTA file and stores all the records in a std::vector.

#include <seqan3/std/ranges> // std::ranges::copy
int main()
{
using record_type = decltype(fin)::record_type;
// You can use a for loop:
for (auto & record : fin)
{
records.push_back(std::move(record));
}
// But you can also do this:
std::ranges::copy(fin, std::cpp20::back_inserter(records));
seqan3::debug_stream << records << '\n';
}
This header includes C++17 filesystem support and imports it into namespace std::filesystem (independ...
T push_back(T... args)

Note that you can move the record out of the file if you want to store it somewhere without copying.

int main()
{
using record_type = typename decltype(fin)::record_type;
record_type rec = std::move(*fin.begin()); // avoid copying
}

Writing records

The easiest way to write to a sequence file is to use the seqan3::sequence_file_output::push_back() or seqan3::sequence_file_output::emplace_back() member functions. These work similarly to how they work on a std::vector.

#include <string>
int main()
{
using namespace seqan3::literals;
using sequence_record_type = seqan3::sequence_record<types, fields>;
for (int i = 0; i < 5; ++i) // ...
{
std::string id{"test_id"};
seqan3::dna5_vector sequence{"ACGT"_dna5};
sequence_record_type record{std::move(sequence), std::move(id)};
fout.push_back(record);
}
}
A class for writing sequence files, e.g. FASTA, FASTQ ...
Definition: output.hpp:167
The record type of seqan3::sequence_file_input.
Definition: record.hpp:27
Provides seqan3::dna5, container aliases and string literals.
The generic concept for a (biological) sequence.
Provides seqan3::sequence_file_output and corresponding traits classes.
Provides seqan3::sequence_record.
A class template that holds a choice of seqan3::field.
Definition: record.hpp:128
Type that contains multiple types.
Definition: type_list.hpp:29

The class seqan3::sequence_file_output takes an extra parameter allowing to custom select the fields and their order.

#include <sstream>
#include <string>
#include <tuple>
#include <vector>
int main()
{
using namespace seqan3::literals;
for (int i = 0; i < 5; i++)
{
std::string id{"test_id"};
// vector of combined data structure:
{'A'_dna5, '1'_phred42},
{'C'_dna5, '3'_phred42}};
auto view_on_seq = seqan3::views::elements<0>(seq_qual);
auto view_on_qual = seqan3::views::elements<1>(seq_qual);
// ...
// Note that the order of the arguments is different from the default `seq, id, qual`,
// because you specified that ID should be first in the fields template argument.
fout.emplace_back(id, view_on_seq, view_on_qual);
// or:
fout.push_back(std::tie(id, view_on_seq, view_on_qual));
}
}
The FastQ format.
Definition: format_fastq.hpp:78
Provides seqan3::views::elements.
Provides seqan3::phred42 quality scores.
Provides quality alphabet composites.
T tie(T... args)

File conversion

Define a custom scoring scheme

Provides seqan3::aminoacid_scoring_scheme.
Provides seqan3::nucleotide_scoring_scheme.
using namespace seqan3::literals;
// Define a simple scoring scheme with match and mismatch cost and get the score.
auto sc_nc = nc_scheme.score('A'_dna4, 'C'_dna4); // sc_nc == -5.
// Define a amino acid similarity matrix and get the score.
auto sc_aa = aa_scheme.score('M'_aa27, 'K'_aa27); // sc_aa == 2.
A data structure for managing and computing the score of two amino acids.
Definition: aminoacid_scoring_scheme.hpp:75
constexpr void set_similarity_matrix(aminoacid_similarity_matrix const matrix_id)
Set the similarity matrix scheme (e.g. blosum62).
Definition: aminoacid_scoring_scheme.hpp:122
A data structure for managing and computing the score of two nucleotides.
Definition: nucleotide_scoring_scheme.hpp:38
@ blosum30
The blosum30 matrix for very distantly related proteins.
A strong type of underlying type score_type that represents the score of two matching characters.
Definition: scoring_scheme_base.hpp:41
A strong type of underlying type score_type that represents the score two different characters.
Definition: scoring_scheme_base.hpp:66
Attention
SeqAn's alignment algorithm computes the maximal similarity score, thus the match score must be set to a positive value and the scores for mismatch and gap must be negative in order to maximize over the matching letters.

Calculate edit distance for a set of sequences

This recipe can be used to calculate the edit distance for all six pairwise combinations. Here we only allow at most 7 errors and filter all alignments with 6 or fewer errors.

#include <utility>
#include <vector>
int main()
{
using namespace seqan3::literals;
std::vector vec{"ACGTGACTGACT"_dna4,
"ACGAAGACCGAT"_dna4,
"ACGTGACTGACT"_dna4,
"AGGTACGAGCGACACT"_dna4};
// Configure the alignment kernel.
auto alignment_results = seqan3::align_pairwise(seqan3::views::pairwise_combine(vec), config);
auto filter_v = std::views::filter([](auto && res) { return res.score() >= -6;});
for (auto const & result : alignment_results | filter_v)
{
seqan3::debug_stream << "Score: " << result.score() << '\n';
}
}
Provides pairwise alignment function.
Sets the global alignment method.
Definition: align_config_method.hpp:107
Sets the minimal score (maximal errors) allowed during an distance computation e.g....
Definition: align_config_min_score.hpp:39
Configures the alignment result to output the score.
Definition: align_config_output.hpp:43
Provides seqan3::dna4, container aliases and string literals.
constexpr configuration edit_scheme
Shortcut for edit distance configuration.
Definition: align_config_edit.hpp:51
constexpr auto align_pairwise(sequence_t &&seq, alignment_config_t const &config)
Computes the pairwise alignment for a pair of sequences or a range over sequence pairs.
Definition: align_pairwise.hpp:136
constexpr auto pairwise_combine
A view adaptor that generates all pairwise combinations of the elements of the underlying range.
Definition: pairwise_combine.hpp:714
Provides seqan3::views::pairwise_combine.

Searching for matches

This recipe can be used to search for all occurrences of a substring and print the number of hits and the positions in an ascending ordering.

using namespace seqan3::literals;
void run_text_single()
{
seqan3::dna4_vector
text{"CGCTGTCTGAAGGATGAGTGTCAGCCAGTGTAACCCGATGAGCTACCCAGTAGTCGAACTGGGCCAGACAACCCGGCGCTAATGCACTCA"_dna4};
seqan3::fm_index index{text};
seqan3::debug_stream << "===== Running on a single text =====\n"
<< "The following hits were found:\n";
for (auto && result : search("GCT"_dna4, index))
seqan3::debug_stream << result << '\n';
}
void run_text_collection()
{
std::vector<seqan3::dna4_vector> text{"CGCTGTCTGAAGGATGAGTGTCAGCCAGTGTA"_dna4,
"ACCCGATGAGCTACCCAGTAGTCGAACTG"_dna4,
"GGCCAGACAACCCGGCGCTAATGCACTCA"_dna4};
seqan3::fm_index index{text};
seqan3::debug_stream << "===== Running on a text collection =====\n"
<< "The following hits were found:\n";
for (auto && result : search("GCT"_dna4, index))
seqan3::debug_stream << result << '\n';
}
int main()
{
run_text_single();
run_text_collection();
}
The SeqAn FM Index.
Definition: fm_index.hpp:191
Provides the unidirectional seqan3::fm_index.
auto search(queries_t &&queries, index_t const &index, configuration_t const &cfg=search_cfg::default_configuration)
Search a query or a range of queries in an index.
Definition: search.hpp:104
Provides the public interface for search algorithms.

If you want to allow errors in your query, you need to configure the approximate search with the following search configuration objects:

To search for either 1 insertion or 1 deletion you can use the seqan3::search_cfg::error_count:

std::string text{"Garfield the fat cat without a hat."};
seqan3::fm_index index{text};
seqan3::debug_stream << search("cat"s, index, cfg) << '\n';
// prints: [<query_id:0, reference_id:0, reference_pos:14>,
// <query_id:0, reference_id:0, reference_pos:17>,
// <query_id:0, reference_id:0, reference_pos:18>,
// <query_id:0, reference_id:0, reference_pos:32>]
Collection of elements to configure an algorithm.
Definition: configuration.hpp:45
Configuration element that represents the number or rate of deletion errors.
Definition: max_error.hpp:169
Configuration element that represents the number or rate of insertion errors.
Definition: max_error.hpp:124
Configuration element that represents the number or rate of substitution errors.
Definition: max_error.hpp:80
Configuration element that represents the number or rate of total errors.
Definition: max_error.hpp:36
A strong type of underlying type uint8_t that represents the number of errors.
Definition: max_error_common.hpp:31

Reading the CIGAR information into an actual alignment

In SeqAn, the conversion from a CIGAR string to an alignment (two seqan3::aligned_sequences) is done automatically for you. You can access the resulting alignment via seqan3::sam_record::alignment:

int main()
{
auto filename = std::filesystem::current_path() / "example.sam";
seqan3::sam_file_input fin{filename};
for (auto & record : fin)
seqan3::debug_stream << record.id() << ": " << std::get<1>(record.alignment()) << '\n';
}
A class for reading alignment files, e.g. SAM, BAM, BLAST ...
Definition: input.hpp:350
Meta-header for the IO / SAM File submodule .

Combining sequence and alignment files

This recipe can be used to:

  1. Read in a FASTA file with the reference and a SAM file with the alignment
  2. Filter the alignment records and only take those with a mapping quality >= 30.
  3. For the resulting alignments, print which read was mapped against with reference id and the number of seqan3::gap's involved in the alignment (either in aligned reference or in read sequence).
#include <seqan3/std/algorithm> // std::ranges::count
#include <string>
#include <vector>
int main()
{
// read in reference information
seqan3::sequence_file_input reference_file{current_path / "reference.fasta"};
std::vector<std::string> reference_ids{};
std::vector<seqan3::dna5_vector> reference_sequences{};
for (auto && record : reference_file)
{
reference_ids.push_back(std::move(record.id()));
reference_sequences.push_back(std::move(record.sequence()));
}
// filter out alignments
seqan3::sam_file_input mapping_file{current_path / "mapping.sam", reference_ids, reference_sequences};
auto mapq_filter = std::views::filter([] (auto & record) { return record.mapping_quality() >= 30; });
for (auto & record : mapping_file | mapq_filter)
{
// as loop
size_t sum_reference{};
for (auto const & char_reference : std::get<0>(record.alignment()))
if (char_reference == seqan3::gap{})
++sum_reference;
// or via std::ranges::count
size_t sum_read = std::ranges::count(std::get<1>(record.alignment()), seqan3::gap{});
// The reference_id is ZERO based and an optional. -1 is represented by std::nullopt (= reference not known).
std::optional reference_id = record.reference_id();
seqan3::debug_stream << record.id() << " mapped against "
<< (reference_id ? std::to_string(reference_id.value()) : "unknown reference")
<< " with "
<< sum_read << " gaps in the read sequence and "
<< sum_reference << " gaps in the reference sequence.\n";
}
}
Adaptations of algorithms from the Ranges TS.
The alphabet of a gap character '-'.
Definition: gap.hpp:39
Provides seqan3::gap.
constexpr ptrdiff_t count
Count the occurrences of a type in a pack.
Definition: traits.hpp:169
Provides the seqan3::record template and the seqan3::field enum.
Provides seqan3::sam_file_input and corresponding traits classes.
T to_string(T... args)
T value(T... args)

Map reads and write output to SAM file

For a full recipe on creating your own readmapper, see the very end of the tutorial Implementing your own read mapper with SeqAn.

void map_reads(std::filesystem::path const & query_path,
std::filesystem::path const & index_path,
std::filesystem::path const & sam_path,
reference_storage_t & storage,
uint8_t const errors)
{
// we need the alphabet and text layout before loading
{
std::ifstream is{index_path, std::ios::binary};
cereal::BinaryInputArchive iarchive{is};
iarchive(index);
}
seqan3::sequence_file_input query_file_in{query_path};
for (auto && record : query_file_in)
{
auto & query = record.sequence();
for (auto && result : search(query, index, search_config))
{
size_t start = result.reference_begin_position() ? result.reference_begin_position() - 1 : 0;
std::span text_view{std::data(storage.seqs[result.reference_id()]) + start, query.size() + 1};
for (auto && alignment : seqan3::align_pairwise(std::tie(text_view, query), align_config))
{
auto aligned_seq = alignment.alignment();
size_t ref_offset = alignment.sequence1_begin_position() + 2 + start;
size_t map_qual = 60u + alignment.score();
sam_out.emplace_back(query,
record.id(),
storage.ids[result.reference_id()],
ref_offset,
aligned_seq,
record.base_qualities(),
map_qual);
}
}
}
}
Configures the alignment result to output the alignment.
Definition: align_config_output.hpp:171
Configures the alignment result to output the begin positions.
Definition: align_config_output.hpp:131
The SeqAn Bidirectional FM Index.
Definition: bi_fm_index.hpp:61
A class for writing alignment files, e.g. SAM, BAL, BLAST, ...
Definition: output.hpp:161
Configuration element to receive all hits with the lowest number of errors within the error bounds.
Definition: hit.hpp:57
T data(T... args)
@ ref_offset
Sequence (seqan3::field::ref_seq) relative start position (0-based), unsigned value.
@ alignment
The (pairwise) alignment stored in an object that models seqan3::detail::pairwise_alignment.
@ mapq
The mapping quality of the seqan3::field::seq alignment, usually a Phred-scaled score.
@ ref_id
The identifier of the (reference) sequence that seqan3::field::seq was aligned to.
@ id
The identifier, usually a string.
A strong type representing free_end_gaps_sequence1_leading of the seqan3::align_cfg::method_global.
Definition: align_config_method.hpp:66
A strong type representing free_end_gaps_sequence1_trailing of the seqan3::align_cfg::method_global.
Definition: align_config_method.hpp:86
A strong type representing free_end_gaps_sequence2_leading of the seqan3::align_cfg::method_global.
Definition: align_config_method.hpp:76
A strong type representing free_end_gaps_sequence2_trailing of the seqan3::align_cfg::method_global.
Definition: align_config_method.hpp:96

Constructing a basic argument parser

void run_program(std::filesystem::path const & reference_path,
std::filesystem::path const & index_path)
{
seqan3::debug_stream << "reference_file_path: " << reference_path << '\n';
seqan3::debug_stream << "index_path " << index_path << '\n';
}
struct cmd_arguments
{
std::filesystem::path reference_path{};
std::filesystem::path index_path{"out.index"};
};
void initialise_argument_parser(seqan3::argument_parser & parser, cmd_arguments & args)
{
parser.info.author = "E. coli";
parser.info.short_description = "Creates an index over a reference.";
parser.info.version = "1.0.0";
parser.add_option(args.reference_path, 'r', "reference", "The path to the reference.",
seqan3::input_file_validator{{"fa","fasta"}});
parser.add_option(args.index_path, 'o', "output", "The output index file path.",
seqan3::output_file_validator{seqan3::output_file_open_options::create_new, {"index"}});
}
int main(int argc, char const ** argv)
{
seqan3::argument_parser parser("Indexer", argc, argv);
cmd_arguments args{};
initialise_argument_parser(parser, args);
try
{
parser.parse();
}
catch (seqan3::argument_parser_error const & ext)
{
std::cerr << "[PARSER ERROR] " << ext.what() << '\n';
return -1;
}
run_program(args.reference_path, args.index_path);
return 0;
}
argument_parser_meta_data info
Aggregates all parser related meta data (see seqan3::argument_parser_meta_data struct).
Definition: argument_parser.hpp:607
void add_option(option_type &value, char const short_id, std::string const &long_id, std::string const &desc, option_spec const spec=option_spec::standard, validator_type option_validator=validator_type{})
Adds an option to the seqan3::argument_parser.
Definition: argument_parser.hpp:248
A validator that checks if a given path is a valid input file.
Definition: validators.hpp:504
A validator that checks if a given path is a valid output file.
Definition: validators.hpp:636
@ standard
The default were no checking or special displaying is happening.
Definition: auxiliary.hpp:239
@ required
Definition: auxiliary.hpp:240
std::string author
Your name ;-)
Definition: auxiliary.hpp:286
std::string version
The version information MAJOR.MINOR.PATH (e.g. 3.1.3)
Definition: auxiliary.hpp:282
std::string short_description
A short description of the application (e.g. "A tool for mapping reads to the genome").
Definition: auxiliary.hpp:284

Constructing a subcommand argument parser

// =====================================================================================================================
// pull
// =====================================================================================================================
struct pull_arguments
{
std::string repository{};
std::string branch{};
bool progress{false};
};
int run_git_pull(seqan3::argument_parser & parser)
{
pull_arguments args{};
parser.add_positional_option(args.repository, "The repository name to pull from.");
parser.add_positional_option(args.branch, "The branch name to pull from.");
try
{
parser.parse();
}
catch (seqan3::argument_parser_error const & ext)
{
seqan3::debug_stream << "[Error git pull] " << ext.what() << "\n";
return -1;
}
seqan3::debug_stream << "Git pull with repository " << args.repository << " and branch " << args.branch << '\n';
return 0;
}
// =====================================================================================================================
// push
// =====================================================================================================================
struct push_arguments
{
std::string repository{};
bool push_all{false};
};
int run_git_push(seqan3::argument_parser & parser)
{
push_arguments args{};
parser.add_positional_option(args.repository, "The repository name to push to.");
parser.add_positional_option(args.branches, "The branch names to push (if none are given, push current).");
try
{
parser.parse();
}
catch (seqan3::argument_parser_error const & ext)
{
seqan3::debug_stream << "[Error git push] " << ext.what() << "\n";
return -1;
}
seqan3::debug_stream << "Git push with repository " << args.repository << " and branches " << args.branches << '\n';
return 0;
}
// =====================================================================================================================
// main
// =====================================================================================================================
int main(int argc, char const ** argv)
{
seqan3::argument_parser top_level_parser{"mygit",
argc,
argv,
{"push", "pull"}};
// Add information and flags, but no (positional) options to your top-level parser.
// Because of ambiguity, we do not allow any (positional) options for the top-level parser.
top_level_parser.info.description.push_back("You can push or pull from a remote repository.");
bool flag{false};
top_level_parser.add_flag(flag, 'f', "flag", "some flag");
try
{
top_level_parser.parse(); // trigger command line parsing
}
catch (seqan3::argument_parser_error const & ext) // catch user errors
{
seqan3::debug_stream << "[Error] " << ext.what() << "\n"; // customise your error message
return -1;
}
seqan3::argument_parser & sub_parser = top_level_parser.get_sub_parser(); // hold a reference to the sub_parser
std::cout << "Proceed to sub parser.\n";
if (sub_parser.info.app_name == std::string_view{"mygit-pull"})
return run_git_pull(sub_parser);
else if (sub_parser.info.app_name == std::string_view{"mygit-push"})
return run_git_push(sub_parser);
else
std::cout << "Unhandled subparser named " << sub_parser.info.app_name << '\n';
// Note: Arriving in this else branch means you did not handle all sub_parsers in the if branches above.
return 0;
}
void add_positional_option(option_type &value, std::string const &desc, validator_type option_validator=validator_type{})
Adds a positional option to the seqan3::argument_parser.
Definition: argument_parser.hpp:311
void parse()
Initiates the actual command line parsing.
Definition: argument_parser.hpp:398
argument_parser & get_sub_parser()
Returns a reference to the sub-parser instance if subcommand parsing was enabled.
Definition: argument_parser.hpp:426
@ flag
The alignment flag (bit information), uint16_t value.
@ on
Automatic update notifications should be enabled.
std::string app_name
The application name that will be displayed on the help page.
Definition: auxiliary.hpp:280

Serialise data structures with cereal

#include <fstream>
#include <vector>
#include <cereal/archives/binary.hpp> // includes the cereal::BinaryInputArchive and cereal::BinaryOutputArchive
#include <cereal/types/vector.hpp> // includes cerealisation support for std::vector
#include <seqan3/test/tmp_filename.hpp>
// Written for std::vector, other types also work.
void load(std::vector<int16_t> & data, seqan3::test::tmp_filename & tmp_file)
{
std::ifstream is(tmp_file.get_path(), std::ios::binary); // Where input can be found.
cereal::BinaryInputArchive archive(is); // Create an input archive from the input stream.
archive(data); // Load data.
}
// Written for std::vector, other types also work.
void store(std::vector<int16_t> const & data, seqan3::test::tmp_filename & tmp_file)
{
std::ofstream os(tmp_file.get_path(), std::ios::binary); // Where output should be stored.
cereal::BinaryOutputArchive archive(os); // Create an output archive from the output stream.
archive(data); // Store data.
}
int main()
{
// The following example is for an std::vector but any seqan3 data structure that is documented as serialisable
// could be used, e.g. seqan3::fm_index.
seqan3::test::tmp_filename tmp_file{"data.out"}; // This is a temporary file, use any other filename.
std::vector<int16_t> vec{1,2,3,4};
store(vec, tmp_file); // Calls store on a std::vector.
// This vector is needed to load the information into it.
load(vec2, tmp_file); // Calls load on a std::vector.
seqan3::debug_stream << vec2 << '\n'; // Prints [1,2,3,4].
return 0;
}
constexpr void store(void *mem_addr, simd_t const &simd_vec)
Store simd_t size bits of integral data into memory.
Definition: algorithm.hpp:358
constexpr simd_t load(void const *mem_addr)
Load simd_t size bits of integral data from memory.
Definition: algorithm.hpp:316

Converting a range of an alphabet

using seqan3::operator""_dna4;
using seqan3::operator""_dna5;
using seqan3::operator""_phred42;
int main()
{
// A vector of combined sequence and quality information.
std::vector<seqan3::dna4q> sequence1{{'A'_dna4, '!'_phred42},
{'C'_dna4, 'A'_phred42},
{'G'_dna4, '6'_phred42},
{'T'_dna4, '&'_phred42}};
// A vector of dna5.
std::vector<seqan3::dna5> sequence2{"AGNCGTNNCAN"_dna5};
// Convert dna4q to dna4.
// Since `sequence1` is an lvalue, we capture `in` via const &. When unsure, use the general case below.
auto view1 = sequence1 | std::views::transform([] (auto const & in) { return static_cast<seqan3::dna4>(in); });
seqan3::debug_stream << view1 << '\n'; // ACGT
// Convert dna5 to dna4.
// General case: Perfect forward.
auto view2 = sequence2
| std::views::transform([] (auto && in)
{
return static_cast<seqan3::dna4>(std::forward<decltype(in)>(in));
});
seqan3::debug_stream << view2 << '\n'; // AGACGTAA
return 0;
}
Provides aliases for qualified.
typename decltype(detail::split_after< i >(list_t{}))::first_type take
Return a seqan3::type_list of the first n types in the input type list.
Definition: traits.hpp:368

A custom dna4 alphabet that converts all unknown characters to A

When assigning from char or converting from a larger nucleotide alphabet to a smaller one, loss of information can occur since obviously some bases are not available. When converting to seqan3::dna5 or seqan3::rna5, non-canonical bases (letters other than A, C, G, T, U) are converted to 'N' to preserve ambiguity at that position. For seqan3::dna4 and seqan3::rna4 there is no letter 'N' to represent ambiguity, so the conversion from char for IUPAC characters tries to choose the best fitting alternative (see seqan3::dna4 for more details).

If you would like to always convert unknown characters to A instead, you can create your own alphabet with a respective char conversion table very easily like this:

// We inherit from seqan3::nucleotide_base s.t. we do not need to implement the full nucleotide interface
// but it is sufficient to define `rank_to_char`, `char_to_rank`, and `complement_table`.
class my_dna4 : public seqan3::nucleotide_base<my_dna4, 4/*alphabet size is 4*/>
{
public:
using nucleotide_base<my_dna4, 4>::nucleotide_base; // Use constructors of the base class.
private:
// Returns the character representation of rank. This is where rank conversion for to_char() is handled!
static constexpr char_type rank_to_char(rank_type const rank)
{
return rank_to_char_table[rank];
}
// Returns the rank representation of character. This is where char conversion for assign_char() is handled!
static constexpr rank_type char_to_rank(char_type const chr)
{
return char_to_rank_table[static_cast<index_t>(chr)];
}
// Returns the complement by rank. This is where complement is handled and with this, my_dna4 models
// seqan3::nucleotide_alphabet.
static constexpr rank_type rank_complement(rank_type const rank)
{
return rank_complement_table[rank];
}
private:
// === lookup-table implementation detail ===
// Value to char conversion table.
static constexpr char_type rank_to_char_table[alphabet_size] {'A', 'C', 'G', 'T'}; // rank 0,1,2,3
// Char-to-value conversion table.
static constexpr std::array<rank_type, 256> char_to_rank_table
{
[] () constexpr
{
// By default, everything has rank 0 which equals `A`.
std::array<rank_type, 256> conversion_table{};
conversion_table['C'] = conversion_table['c'] = 1;
conversion_table['G'] = conversion_table['g'] = 2;
conversion_table['T'] = conversion_table['t'] = 3;
conversion_table['U'] = conversion_table['T']; // set U equal to T
conversion_table['u'] = conversion_table['t']; // set u equal to t
return conversion_table;
}()
};
// The rank complement table.
static constexpr rank_type rank_complement_table[alphabet_size]
{
3, // T is complement of 'A'_dna4
2, // G is complement of 'C'_dna4
1, // C is complement of 'G'_dna4
0 // A is complement of 'T'_dna4
};
friend nucleotide_base<my_dna4, 4>; // Grant seqan3::nucleotide_base access to private/protected members.
friend nucleotide_base<my_dna4, 4>::base_t; // Grant seqan3::alphabet_base access to private/protected members.
};
// Defines the `_my_dna4` *char literal* so you can write `'C'_my_dna4` instead of `my_dna4{}.assign_char('C')`.
constexpr my_dna4 operator""_my_dna4(char const c) noexcept
{
return my_dna4{}.assign_char(c);
}
int main()
{
my_dna4 my_letter{'C'_my_dna4};
my_letter.assign_char('S'); // Characters other than A,C,G,T are implicitly converted to `A`.
seqan3::debug_stream << my_letter << "\n"; // "A";
seqan3::debug_stream << seqan3::complement(my_letter) << "\n"; // "T";
}
static constexpr detail::min_viable_uint_t< size > alphabet_size
The size of the alphabet, i.e. the number of different values it can take.
Definition: alphabet_base.hpp:203
rank_type rank
The value of the alphabet letter is stored as the rank.
Definition: alphabet_base.hpp:265
A CRTP-base that refines seqan3::alphabet_base and is used by the nucleotides.
Definition: nucleotide_base.hpp:43
constexpr auto complement
Return the complement of a nucleotide object.
Definition: concept.hpp:104
Provides seqan3::nucleotide_base.

If you are interested in custom alphabets, also take a look at our tutorial How to write your own alphabet.

Controlling threads of (de-)compression streams

When reading or writing compressed files, parallelisation is automatically applied when using BGZF-compressed files, e.g., BAM files. This will use all available threads by default and can be adjusted by setting seqan3::contrib::bgzf_thread_count to the desired value:

// This does not work, the value must be overwritten within a function.
// seqan3::contrib::bgzf_thread_count = 1u;
int main()
{
// Use one thread for (de-)compression.
seqan3::contrib::bgzf_thread_count = 1u;
// Read/Write compressed files.
// ...
return 0;
}
Meta-header for the IO module .