SeqAn3 3.4.0-rc.1
The Modern C++ library for sequence analysis.
No Matches
Sequence Alignment

The alignment module contains concepts, algorithms and classes that are related to the computation of pairwise and multiple sequence alignments. More...

+ Collaboration diagram for Sequence Alignment:


 Aligned Sequence
 Provides seqan3::aligned_sequence, as well as various ranges that model it.
 CIGAR Conversion
 The CIGAR Conversion submodule contains utility functions to convert a CIGAR to an alignment or vice versa.
 Provides configuration elements for the pairwise alignment configuration.
 The decorator submodule contains special SeqAn decorators.
 Provides data structures for representing alignment coordinates and alignments as a matrix.
 Pairwise Alignments
 Provides the algorithmic components for the computation of pairwise alignments.
 Provides the data structures used for scoring alphabets and sequences.

Detailed Description

The alignment module contains concepts, algorithms and classes that are related to the computation of pairwise and multiple sequence alignments.

Sequence Alignment

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data. [1]

Pairwise Sequence Alignment

SeqAn offers a generic multi-purpose alignment library comprising all widely known alignment algorithms as well as many special algorithms. These algorithms are all accessible through an easy to use alignment interface which is described in Pairwise Alignments.

The following code snippet demonstrates a simple use of the pairwise alignment interface.

#include <utility>
int main()
using namespace seqan3::literals;
seqan3::dna4_vector s1 = "ACGTGAACTGACT"_dna4;
seqan3::dna4_vector s2 = "ACGAAGACCGAT"_dna4;
// Configure the alignment kernel.
auto config =
// Invoke the pairwise alignment which returns a lazy range over alignment results.
auto results = seqan3::align_pairwise(std::tie(s1, s2), config);
auto & res = *results.begin();
seqan3::debug_stream << "Score: " << res.score() << '\n';
Provides pairwise alignment function.
Sets the global alignment method.
Definition align_config_method.hpp:122
Sets the scoring scheme for the alignment algorithm.
Definition align_config_scoring_scheme.hpp:45
A data structure for managing and computing the score of two nucleotides.
Definition nucleotide_scoring_scheme.hpp:38
Provides seqan3::debug_stream and related types.
Provides seqan3::dna4, container aliases and string literals.
constexpr auto align_pairwise(sequence_t &&seq, alignment_config_t const &config)
Computes the pairwise alignment for a pair of sequences or a range over sequence pairs.
Definition align_pairwise.hpp:134
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition debug_stream.hpp:40
The SeqAn namespace for literals.
Provides seqan3::nucleotide_scoring_scheme.
T tie(T... args)

Multiple Sequence Alignment

The current version of SeqAn does not offer multiple sequence alignments (MSA). Please reach out to us with a specific use case we should consider in future versions.

Alignments represented as CIGAR String used in SAM/BAM Files

A common file format to store (semi) alignments is the SAM/BAM format. In a SAM/BAM file, the alignment is represented as a CIGAR string. To allow back and forth conversion from a CIGAR string to the alignment representation in SeqAn, we provide the following functions:

For reading and writing SAM/BAM files, we provide the seqan3::sam_file_input and seqan3::sam_file_ouput.