fn() splitAlignment
Compute split alignments.

Defined in <seqan/align_split.h>
Signature TScoreValue splitAlignment(alignL, alignR, scoringScheme[, lowerDiag, upperDiag]); TScoreValue splitAlignment(gapsHL, gapsVL, gapsHR, gapsVR, scoringScheme[, lowerDiag, upperDiag]);

Parameters

alignL Align object with two rows for the left alignment.
alignR Align object with two rows for the right alignment.
gapsHL Gaps object with the horizontal/contig row for the left alignment.
gapsVL Gaps object with the vertical/read row for the left alignment.
gapsHR Gaps object with the horizontal/contig row for the right alignment.
gapsVR Gaps object with the vertical/read row for the right alignment.
scoringScheme The scoring scheme to use for the alignment.
lowerDiag The lower diagonal.You have to specify the upper and lower diagonals for the left alignment. For the right alignment, the corresponding diagonals are chosen for the lower right part of the DP matrix, int.
upperDiag The lower diagonal. Also see remark for lowerDiag, int.

Return Values

TScoreValue The sum of the alignment scores of both alignments (Metafunction: Value of the type of scoringScheme).

Detailed Description

There are two variants of the split alignment problem. In the first variant, we wan to align two sequences where the first (say the reference) one is shorter than the second (say a read) and the read contains an insertion with respect to the reference. We now want to align the read agains the reference such that the left part of the read aligns well against the left part of the reference and the right part of the read aligns well against the right part of the reference. The center gap in the reference is free.

For example:

reference  AGCATGTTAGATAAGATAGC-----------TGTGCTAGTAGGCAGTCAGCGCCAT
           ||||||||||||||||||||           |||||||||||||||||||||||||
read       AGCATGTTAGATAAGATAGCCCCCCCCCCCCTGTGCTAGTAGGCAGTCAGCGCCAT

The second variant is to align two sequences A and B against a reference such that the left part of A aligns well to the left part of the reference and the right part of B aligns well to the right part of the reference. Together, both reads span the whole reference and overlap with an insertion in the reference.

reference  AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCAT
           |||||||||||||||||| | ||
           AGCATGTTAGATAAGATATCCGTCC
           read 1
                             ||| |||||||||||||||||||||||
                           CCGCTATGCTAGTAGGCAGTCAGCGCCAT
                                                  read 2

The resulting alignment of the left/right parts is depicted below. The square brackets indicate clipping positions.

reference  AGCATGTTAGATAAGATA    [GCTGTGCTAGTAGGCAGTCAGCGCCAT
           ||||||||||||||||||    [ | ||
           AGCATGTTAGATAAGATA    [TCCGTCC
           read 1
reference  AGCATGTTAGATAAGATA]    GTGCTAGTAGGCAGTCAGCGCCAT
                             ]     |||||||||||||||||||||||
                        CCGCT]    ATGCTAGTAGGCAGTCAGCGCCAT
                                                    read 2

In the first case, we want to find the one breakpoint in the reference and the two breakpoints in the reads and the alignment of the left and right well-aligning read parts. In the second case, we want to find the one breakpoint in the reference and the breakpoint/clipping position in each read.

The splitAlignment() function takes as the input two alignments. The sequence in each alignment's first row is the reference and the sequence of the second row is the read. The sequence has to be the same sequence whereas the reads might differ. If the reads are the same then this is the same as the first case and if the reads differ then this is the second case.

The result is two alignments of the left and right contig path clipped appropriately. The resulting score is the sum of the scores of both alignments.

Remarks

The DP algorithm is chosen automatically depending on whether the gap open and extension costs are equal.

Example

The following example demonstrates the usage of splitAlignment in the first case. The second case works accordingly.

#include <seqan/align.h>
#include <seqan/align_split.h>
#include <seqan/file.h>  // output of String
#include <seqan/sequence.h>
#include <seqan/score.h>

using namespace seqan;

int main()
{
    std::cout << "Situation\n"
              << "\n"
              << "REF  AGCATGTTAGATAAGATAG-----------CTGTGCTAGTAGGCAGTCAGCGCCAT\n"
              << "READ AGCATGTTAGATAAGATAGCCCCCCCCCCCCTGTGCTAGTAGGCAGTCAGCGCCAT\n"
              << "\n";

    // Demo for split alignment where the read contains an insertion with
    // respect to the reference.  The input of the function is the infix of
    // reference earlier identified.
    Dna5String ref =  "AGCATGTTAGATAAGATAG"         "CTGTGCTAGTAGGCAGTCAGCGCCAT";
    Dna5String read = "AGCATGTTAGATAAGATAGCCCCCCCCCCCCTGTGCTAGTAGGCAGTCAGCGCCAT";

    // Prepare Gaps objects.  We need one for the left part and one for the
    // right part of the alignment.
    Align<Dna5String> alignL;
    resize(rows(alignL), 2);
    assignSource(row(alignL, 0), ref);
    assignSource(row(alignL, 1), read);
    Align<Dna5String> alignR;
    resize(rows(alignR), 2);
    assignSource(row(alignR, 0), ref);
    assignSource(row(alignR, 1), read);

    // Define scoring scheme.
    Score<int, Simple> scoringScheme(1, -1, -1);

    // Call split alignment function.
    splitAlignment(alignL, alignR, scoringScheme);

    // Print resulting alignment to stdout.
    std::cout << "Resulting alignments\n"
              << "\n"
              << "Left\n"
              << alignL
              << "Right\n"
              << alignR
              << "\n";

    // Get relevant clipping positions.
    int refSplitPosition = toSourcePosition(row(alignL, 0), clippedEndPosition(row(alignL, 0)));
    SEQAN_ASSERT_EQ(refSplitPosition, toSourcePosition(row(alignR, 0), 0));
    int readSplitLPosition = toSourcePosition(row(alignL, 1), clippedEndPosition(row(alignL, 1)));
    int readSplitRPosition = toSourcePosition(row(alignR, 1), 0);

    std::cout << "refSplitPosition   == " << refSplitPosition << "\n"
              << "readSplitLPosition == " << readSplitLPosition << "\n"
              << "readSplitRPosition == " << readSplitRPosition << "\n\n";

    // Print sequence parts to stdout.
    std::cout << "Reference Left  " << prefix(ref, refSplitPosition) << "\n"
              << "Reference Right " << suffix(ref, refSplitPosition) << "\n"
              << "\n"
              << "Read Left       " << prefix(read, readSplitLPosition) << "\n"
              << "Read Center     " << infix(read, readSplitLPosition, readSplitRPosition) << "\n"
              << "Read Right      " << suffix(read, readSplitRPosition) << "\n";

    return 0;
}

The output is as follows.

Situation

REF  AGCATGTTAGATAAGATAG-----------CTGTGCTAGTAGGCAGTCAGCGCCAT
READ AGCATGTTAGATAAGATAGCCCCCCCCCCCCTGTGCTAGTAGGCAGTCAGCGCCAT

Resulting alignments

Left
      0     .    :    .     
        AGCATGTTAGATAAGATAG
        |||||||||||||||||||
        AGCATGTTAGATAAGATAG


Right
      0     .    :    .    :    .  
        CTGTGCTAGTAGGCAGTCAGCGCCAT
        ||||||||||||||||||||||||||
        CTGTGCTAGTAGGCAGTCAGCGCCAT



refSplitPosition   == 19
readSplitLPosition == 19
readSplitRPosition == 30

Reference Left  AGCATGTTAGATAAGATAG
Reference Right CTGTGCTAGTAGGCAGTCAGCGCCAT

Read Left       AGCATGTTAGATAAGATAG
Read Center     CCCCCCCCCCC
Read Right      CTGTGCTAGTAGGCAGTCAGCGCCAT

Data Races

Thread safety unknown!