fn() localAlignment
Computes the best pairwise local alignment using the Smith-Waterman algorithm.

Defined in <seqan/align.h>, <seqan/align_parallel.h>
Signature TScoreCollection localAlignment([exec,] alignCollection, scoringScheme, [lowerDiag, upperDiag]); TScoreCollection localAlignment([exec,] gapsHCollection, gapsVCollection, scoringScheme, [lowerDiag, upperDiag]); TScoreVal localAlignment(align, scoringScheme, [lowerDiag, upperDiag]); TScoreVal localAlignment(gapsH, gapsV, scoringScheme, [lowerDiag, upperDiag]); TScoreVal localAlignment(fragmentString, scoringScheme, [lowerDiag, upperDiag]);

Parameters

exec The ExecutionPolicy used for the alignment algorithm.
alignCollection A collection of Align objects containing the sequences to compute the alignment for.
gapsHCollection A collection of Gaps objects containing the first sequences to compute the alignment for.
gapsVCollection A collection of Gaps objects containing the second sequences to compute the alignment for.
gapsH Horizontal gapped sequence in alignment matrix. Types: Gaps
gapsV Vertical gapped sequence in alignment matrix. Types: Gaps
align An Align object that stores the alignment. The number of rows must be 2 and the sequences must have already been set. align[0] is the horizontal one in the alignment matrix alignment, align[1] is the vertical one.
fragmentString String of Fragment objects. The sequence with id 0 is the horizontal one, the sequence with id 1 is the vertical one.
scoringScheme The scoring scheme to use for the alignment.
lowerDiag Optional lower diagonal (int).
upperDiag Optional upper diagonal (int).

Return Values

TScoreVal Score value of the resulting alignment (Metafunction Value of the type of scoringScheme).
TScoreCollection A collection of computed scores for every aligned sequence pair. The value type of this score is String over the score type of the passed scoring scheme. (Metafunction: Value of the type of scoringScheme).

Detailed Description

Warning:

There are currently some limitations in the use of the execution policy. The WavefrontExecutionPolicy cannot yet compute the traceback and is only allowed for the globalAlignmentScore and localAlignmentScore interface. The banded version only works for collections if all sequences within one collection have the same size.

Note:

In order to get the performance advantage of vectorised execution one has to compile the code with the respective CPU flags. For most Intel based CPUs the compiler flag -msse4 can be used for gcc and clang to enable vectorisation for 128 bit wide registers. For CPUs that support wider register please read the respective documentation on how to select the correct compilation flags.

The Waterman-Eggert algorithm (local alignment with declumping) is available through the LocalAlignmentEnumerator class.

When using Gaps and Align objects, only parts (i.e. one infix) of each sequence will be aligned. This will be presented to the user by setting the clipping begin and end position of the gaps (the rows in the case of Align objects). When using Fragment strings, these parts of the sequences will not appear in any fragment.

There exist multiple overloads for this function with two configuration dimensions.

First, you can select the type of the target storing the alignment. This can be either an Align object, two Gaps objects, or a string of Fragment objects. Align objects provide an interface to tabular alignments with the restriction of all rows having the same type. Using two Gaps objects has the advantage that you an align sequences with different types, for example DnaString and Dna5String. Using Fragment strings is useful for collecting many pairwise alignments, for example in the construction of Alignment Graphs for multiple- sequence alignments (MSA).

Second, you can optionally give a band for the alignment using lowerDiag and upperDiag. The center diagonal has index 0, the ith diagonal below has index -i, the ith above has index i.

The examples below show some common use cases.

Examples

Local alignment of two sequences using an Align object.

Dna5String seqH = "CGATT";
Dna5String seqV = "CGAAATT";

Align<Dna5String> align;
resize(rows(align), 2);
assignSource(row(align, 0), seqH);
assignSource(row(align, 0), seqV);
Score<int, Simple> scoringScheme(2, -1, -2);

int result = localAlignment(align, scoringScheme);

Local banded alignment of two sequences using two Gaps objects.

Dna5String seqH = "CGATT";
Gaps<Dna5String, ArrayGaps> gapsH(seqH);
DnaString seqV = "CGAAATT";
Gaps<Dna5String, AnchorGaps<> > gapsV(seqV);

Score<int, Simple> scoringScheme(5, -3, -1, -5);

int result = localAlignment(gapsH, gapsV, scoringScheme, -2, 2);

Execution policies

SeqAn supports parallel and vectorised execution of local pairwise alignments. This additional interface takes not just a single pair of sequences but a collection of sequence pairs either in form of a collection over Align objects or as two collections of Gaps where both collections must have the same size. The collection based interface allows to additionally specify an ExecutionPolicy. This execution policy can be used to select the multi-threaded or vectorised implementation or the combination there of of the alignment algorithm. SeqAn implements an inter-sequence vectorisation scheme which means that x alignments are computed in parallel in one SIMD vector where x is the number of elements a vector can compute in parallel. This depends on the architecture's supported SIMD vector width (128 bit, 256 bit or 512 bit) and the selected score type, e.g. int16_t. For example on a CPU architecture that supports SSE4 and a score type of int16_t, 128/16 = 8 alignments can be computed in parallel on a single core. In addition, the execution policy can be configured for multi-threaded execution, such that chunks of sequence pairs from the initial collection are spawned and executed on different threads. The number of threads can be set via the execution policy.

The following example shows an example for a multi-threaded and vectorised execution of global alignments for two collections of sequences:

#include <iostream>

#include <seqan/align_parallel.h>
#include <seqan/stream.h>  // for printing strings

int main()
{
    using TSequence = seqan2::String<seqan2::Dna>;
    using TAlignedSequence = seqan2::Gaps<TSequence>;
    using TThreadModel = seqan2::Parallel;
    using TVectorSpec = seqan2::Vectorial;
    using TExecPolicy = seqan2::ExecutionPolicy<TThreadModel, TVectorSpec>;

    // dummy sequences
    TSequence seqH = "CGATT";
    TSequence seqV = "CGAAATT";

    seqan2::StringSet<TAlignedSequence> seqs1;
    seqan2::StringSet<TAlignedSequence> seqs2;

    for (size_t i = 0; i < 100; ++i)
    {
        appendValue(seqs1, TAlignedSequence(seqH));
        appendValue(seqs2, TAlignedSequence(seqV));
    }

    TExecPolicy execPolicy;
    setNumThreads(execPolicy, 4);

    seqan2::Score<int16_t, seqan2::Simple> scoreAffine(2, -2, -1, -4);

    seqan2::String<int16_t> scores = seqan2::localAlignment(execPolicy, seqs1, seqs2, scoreAffine);

    for (int16_t score : scores)
        std::cout << "Score: " << score << "\n";

    for (size_t pos = 0; pos < seqan2::length(seqs1); ++pos) // print out alignments
    {
        std::cout << seqs1[pos] << "\n";
        std::cout << seqs2[pos] << "\n\n";
    }

    return EXIT_SUCCESS;
}

https://seqan.readthedocs.io/en/main/Tutorial/Algorithms/Alignment/PairwiseSequenceAlignment.html

References

  • Smith TF, Waterman, MS: Identification of Common Molecular Subsequences. J Mol Biol 1981, 147(1):195-7.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

See Also