fn() globalAlignment
Computes the best global pairwise alignment.

Defined in <seqan/align.h>, <seqan/align_parallel.h>
Signature TScoreCollection globalAlignment([exec,] alignCollection, scoringScheme, [alignConfig,] [lowerDiag, upperDiag]); TScoreCollection globalAlignment([exec,] gapsHCollection, gapsVCollection, scoringScheme, [alignConfig,] [lowerDiag, upperDiag]); TScoreVal globalAlignment(align, scoringScheme, [alignConfig,] [lowerDiag, upperDiag,] [algorithmTag]); TScoreVal globalAlignment(gapsH, gapsV, scoringScheme, [alignConfig,] [lowerDiag, upperDiag,] [algorithmTag]); TScoreVal globalAlignment(frags, strings, scoringScheme, [alignConfig,] [lowerDiag, upperDiag,] [algorithmTag]); TScoreVal globalAlignment(alignGraph, scoringScheme, [alignConfig,] [lowerDiag, upperDiag,] [algorithmTag]);

Parameters

exec The ExecutionPolicy used for the alignment algorithm.
alignCollection A collection of Align objects containing the sequences to compute the alignment for.
gapsHCollection A collection of Gaps objects containing the first sequences to compute the alignment for.
gapsVCollection A collection of Gaps objects containing the second sequences to compute the alignment for.
align The Align object to use for storing the pairwise alignment.
gapsH The Gaps object for the first row (horizontal in the DP matrix).
gapsV The Gaps object for the second row (vertical in the DP matrix).
frags String of Fragment objects to store alignment in.
strings StringSet of length two with the strings to align.
alignGraph Alignment Graph for the resulting alignment. Must be initialized with two strings.
scoringScheme The scoring scheme to use for the alignment. Note that the user is responsible for ensuring that the scoring scheme is compatible with algorithmTag.
alignConfig AlignConfig instance to use for the alignment configuration.
lowerDiag Optional lower diagonal (int).
upperDiag Optional upper diagonal (int).
algorithmTag Tag to select the alignment algorithm (see AlignmentAlgorithmTags).

Return Values

TScoreVal Score value of the resulting alignment (Metafunction: Value of the type of scoringScheme).
TScoreCollection A collection of computed scores for every aligned sequence pair. The value type of this score is String over the score type of the passed scoring scheme. (Metafunction: Value of the type of scoringScheme).

Detailed Description

Warning:

There are currently some limitations in the use of the execution policy. The WavefrontExecutionPolicy cannot yet compute the traceback and is only allowed for the globalAlignmentScore and localAlignmentScore interface. The banded version only works for collections if all sequences within one collection have the same size.

Note:

In order to get the performance advantage of vectorised execution, one has to compile the code with the respective CPU flags. For most Intel based CPUs the compiler flag -msse4 can be used for gcc and clang to enable vectorisation for 128 bit wide registers. For CPUs that support wider registers please read the respective documentation on how to select the correct compilation flags.

There exist multiple overloads for this function with four configuration dimensions.

First, you can select whether begin and end gaps are free in either sequence using alignConfig.

Second, you can select the type of the target storing the alignment. This can be either an Align object, two Gaps objects, a AlignmentGraph, or a string of Fragment objects. Align objects provide an interface to tabular alignments with the restriction of all rows having the same type. Using two Gaps objects has the advantage that you an align sequences with different types, for example DnaString and Dna5String. Alignment Graphs provide a graph-based representation of segment-based colinear alignments. Using Fragment strings is useful for collecting many pairwise alignments, for example in the construction of Alignment Graphs for multiple-sequence alignments (MSA).

Third, you can optionally give a band for the alignment using lowerDiag and upperDiag. The center diagonal has index 0, the ith diagonal below has index -i, the ith above has index i.

Fourth, you can select the algorithm to use with algorithmTag. This can be one of NeedlemanWunsch and Gotoh. The Needleman-Wunsch algorithm supports scoring schemes with linear gap costs only while Gotoh's algorithm also allows affine gap costs.

The available alignment algorithms all have some restrictions. Gotoh's algorithm can handle arbitrary substitution and affine gap scores. Needleman-Wunsch is limited to linear gap scores. The implementation of Hirschberg's algorithm is further limited that it does not support alignConfig objects or banding. The implementation of the Myers-Hirschberg algorithm further limits this to only support edit distance (as scores, matches are scored with 0, mismatches are scored with -1).

The examples below show some common use cases.

Examples

Global alignment of two sequences using an Align object and the Needleman-Wunsch algorithm.

#include <iostream>

#include <seqan/basic.h>
#include <seqan/align.h>
#include <seqan/stream.h>  // for printint strings

using namespace seqan2;

int main()
{
    Dna5String seqH = "CGATT";
    Dna5String seqV = "CGAAATT";

    Align<Dna5String> align;
    resize(rows(align), 2);
    assignSource(row(align, 0), seqH);
    assignSource(row(align, 1), seqV);

    Score<int, Simple> scoringScheme(2, -1, -2, -1);
    AlignConfig<> alignConfig;

    int result = globalAlignment(align, scoringScheme, alignConfig);

    std::cout << "Score: " << result << "\n";
    std::cout << "The resulting alignment is\n"
              << align << "\n";

    return 0;
}

The output is as follows:

Score: 8
The resulting alignment is
      0     .   
        CG--ATT
        ||  |||
        CGAAATT



Global banded alignment of two sequences using two Gaps objects and the Gotoh algorithm.

#include <iostream>

#include <seqan/basic.h>
#include <seqan/align.h>
#include <seqan/sequence.h>
#include <seqan/stream.h>  // for printint strings

using namespace seqan2;

int main()
{
    Dna5String seqH = "CGATT";
    Dna5String seqV = "CGAAATT";

    Align<Dna5String> align;
    resize(rows(align), 2);
    assignSource(row(align, 0), seqH);
    assignSource(row(align, 1), seqV);

    Score<int, Simple> scoringScheme(2, -1, -2);
    AlignConfig<> alignConfig;

    int lDiag = -2;
    int uDiag = 2;

    int result = globalAlignment(align, scoringScheme, alignConfig, lDiag, uDiag);

    std::cout << "Score: " << result << "\n";
    std::cout << "The resulting alignment is\n"
              << align << "\n";

    return 0;
}

The output is as follows:

Score: 6
The resulting alignment is
      0     .   
        CG--ATT
        ||  |||
        CGAAATT



https://seqan.readthedocs.io/en/main/Tutorial/Algorithms/Alignment/PairwiseSequenceAlignment.html

Execution policies

SeqAn supports parallel and vectorised execution of pairwise alignments. This additional interface takes not just a single pair of sequences but a collection of sequence pairs either in form of a collection over Align objects or as two collections of Gaps where both collections must have the same size. The collection based interface allows to additionally specify an ExecutionPolicy. This execution policy can be used to select the multi-threaded or vectorised implementation or the combination of thereof for the alignment algorithm. SeqAn implements an inter-sequence vectorisation scheme which means that x alignments are computed in parallel in one SIMD vector where x is the number of elements a vector can compute in parallel. This depends on the architecture's supported SIMD vector width (128 bit, 256 bit or 512 bit) and the selected score type, e.g. int16_t. For example on a CPU architecture that supports SSE4 and a score type of int16_t, 128/16 = 8 alignments can be computed in parallel on a single core. In addition, the execution policy can be configured for multi-threaded execution, such that chunks of sequence pairs from the initial collection are spawned and executed on different threads. The number of threads can be set via the execution policy.

The following example shows an example for a multi-threaded and vectorised execution of global alignments for two collections of sequences:

#include <iostream>

#include <seqan/align_parallel.h>
#include <seqan/stream.h>  // for printing strings

int main()
{
    using TSequence = seqan2::String<seqan2::Dna>;
    using TAlignedSequence = seqan2::Gaps<TSequence>;
    using TThreadModel = seqan2::Parallel;
    using TVectorSpec = seqan2::Vectorial;
    using TExecPolicy = seqan2::ExecutionPolicy<TThreadModel, TVectorSpec>;

    // dummy sequences
    TSequence seqH = "CGATT";
    TSequence seqV = "CGAAATT";

    seqan2::StringSet<TAlignedSequence> seqs1;
    seqan2::StringSet<TAlignedSequence> seqs2;

    for (size_t i = 0; i < 100; ++i) // Create a data set of 100 dummy sequences
    {
        appendValue(seqs1, TAlignedSequence(seqH));
        appendValue(seqs2, TAlignedSequence(seqV));
    }

    TExecPolicy execPolicy;
    setNumThreads(execPolicy, 4);

    seqan2::Score<int16_t, seqan2::Simple> scoreAffine(2, -2, -1, -4);

    seqan2::String<int16_t> scores = seqan2::globalAlignment(execPolicy, seqs1, seqs2, scoreAffine);

    for (int16_t score : scores)
        std::cout << "Score: " << score << "\n";

    for (size_t pos = 0; pos < seqan2::length(seqs1); ++pos) // print out alignments
    {
        std::cout << seqs1[pos] << "\n";
        std::cout << seqs2[pos] << "\n\n";
    }

    return EXIT_SUCCESS;
}

References

  • Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48(3): 443-53.
  • Gotoh O: An improved algorithm for matching biological sequences. J Mol Biol 1982, 162(3):705-8

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

See Also