fn() splitAlignmentCompute split alignments.
Defined in | <seqan/align_split.h> |
---|---|
Signature |
TScoreValue splitAlignment(alignL, alignR, scoringScheme[, config][, lowerDiag, upperDiag]);
TScoreValue splitAlignment(gapsHL, gapsVL, gapsHR, gapsVR, scoringScheme[, config][, lowerDiag, upperDiag]);
|
Parameters
alignL
|
Align object with two rows for the left alignment. |
---|---|
alignR
|
Align object with two rows for the right alignment. |
gapsHL
|
Gaps object with the horizontal/contig row for the left alignment. |
gapsVL
|
Gaps object with the vertical/read row for the left alignment. |
gapsHR
|
Gaps object with the horizontal/contig row for the right alignment. |
gapsVR
|
Gaps object with the vertical/read row for the right alignment. |
scoringScheme
|
The scoring scheme to use for the alignment. |
config
|
A configuration object of type AlignConfig, to specify free-end-gaps. |
lowerDiag
|
The lower diagonal.You have to specify the upper and lower diagonals for the left alignment. For the right alignment, the corresponding diagonals are chosen for the lower right part of the DP matrix, int. |
upperDiag
|
The lower diagonal. Also see remark for lowerDiag, int. |
Return Values
TScoreValue |
The sum of the alignment scores of both alignments (Metafunction: Value of the type of scoringScheme). |
---|
Detailed Description
There are two variants of the split alignment problem. In the first variant, we want to align two sequences where the first (say the reference) one is shorter than the second (say a read) and the read contains an insertion with respect to the reference. We now want to align the read agains the reference such that the left part of the read aligns well against the left part of the reference and the right part of the read aligns well against the right part of the reference. The center gap in the reference is free.
For example:
reference AGCATGTTAGATAAGATAGC-----------TGTGCTAGTAGGCAGTCAGCGCCAT |||||||||||||||||||| ||||||||||||||||||||||||| read AGCATGTTAGATAAGATAGCCCCCCCCCCCCTGTGCTAGTAGGCAGTCAGCGCCAT
The second variant is to align two sequences A and B against a reference such that the left part of A aligns well to the left part of the reference and the right part of B aligns well to the right part of the reference. Together, both reads span the whole reference and overlap with an insertion in the reference.
reference AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCAT |||||||||||||||||| | || AGCATGTTAGATAAGATATCCGTCC read 1 ||| ||||||||||||||||||||||| CCGCTATGCTAGTAGGCAGTCAGCGCCAT read 2
The resulting alignment of the left/right parts is depicted below. The square brackets indicate clipping positions.
reference AGCATGTTAGATAAGATA [GCTGTGCTAGTAGGCAGTCAGCGCCAT |||||||||||||||||| [ | || AGCATGTTAGATAAGATA [TCCGTCC read 1 reference AGCATGTTAGATAAGATA] GTGCTAGTAGGCAGTCAGCGCCAT ] ||||||||||||||||||||||| CCGCT] ATGCTAGTAGGCAGTCAGCGCCAT read 2
In the first case, we want to find the one breakpoint in the reference and the two breakpoints in the reads and the alignment of the left and right well-aligning read parts. In the second case, we want to find the one breakpoint in the reference and the breakpoint/clipping position in each read.
The splitAlignment() function takes as the input two alignments. The sequence in each alignment's first row is the reference and the sequence of the second row is the read. The sequence has to be the same sequence whereas the reads might differ. If the reads are the same then this is the same as the first case and if the reads differ then this is the second case.
The result is two alignments of the left and right contig path clipped appropriately. The resulting score is the sum of the scores of both alignments.
Remarks
The DP algorithm is chosen automatically depending on whether the gap open and extension costs are equal.
Example
The following example demonstrates the usage of splitAlignment in the first case. The second case works accordingly.
#include <seqan/align.h>
#include <seqan/align_split.h>
#include <seqan/file.h> // output of String
#include <seqan/sequence.h>
#include <seqan/score.h>
using namespace seqan;
int main()
{
std::cout << "Situation\n"
<< "\n"
<< "REF AGCATGTTAGATAAGATAG-----------CTGTGCTAGTAGGCAGTCAGCGCCAT\n"
<< "READ AGCATGTTAGATAAGATAGCCCCCCCCCCCCTGTGCTAGTAGGCAGTCAGCGCCAT\n"
<< "\n";
// Demo for split alignment where the read contains an insertion with
// respect to the reference. The input of the function is the infix of
// reference earlier identified.
Dna5String ref = "AGCATGTTAGATAAGATAG" "CTGTGCTAGTAGGCAGTCAGCGCCAT";
Dna5String read = "AGCATGTTAGATAAGATAGCCCCCCCCCCCCTGTGCTAGTAGGCAGTCAGCGCCAT";
// Prepare Gaps objects. We need one for the left part and one for the
// right part of the alignment.
Align<Dna5String> alignL;
resize(rows(alignL), 2);
assignSource(row(alignL, 0), ref);
assignSource(row(alignL, 1), read);
Align<Dna5String> alignR;
resize(rows(alignR), 2);
assignSource(row(alignR, 0), ref);
assignSource(row(alignR, 1), read);
// Define scoring scheme.
Score<int, Simple> scoringScheme(1, -1, -1);
// Call split alignment function.
splitAlignment(alignL, alignR, scoringScheme);
// Print resulting alignment to stdout.
std::cout << "Resulting alignments\n"
<< "\n"
<< "Left\n"
<< alignL
<< "Right\n"
<< alignR
<< "\n";
// Get relevant clipping positions.
int refSplitPosition = toSourcePosition(row(alignL, 0), clippedEndPosition(row(alignL, 0)));
SEQAN_ASSERT_EQ(refSplitPosition, toSourcePosition(row(alignR, 0), 0));
int readSplitLPosition = toSourcePosition(row(alignL, 1), clippedEndPosition(row(alignL, 1)));
int readSplitRPosition = toSourcePosition(row(alignR, 1), 0);
std::cout << "refSplitPosition == " << refSplitPosition << "\n"
<< "readSplitLPosition == " << readSplitLPosition << "\n"
<< "readSplitRPosition == " << readSplitRPosition << "\n\n";
// Print sequence parts to stdout.
std::cout << "Reference Left " << prefix(ref, refSplitPosition) << "\n"
<< "Reference Right " << suffix(ref, refSplitPosition) << "\n"
<< "\n"
<< "Read Left " << prefix(read, readSplitLPosition) << "\n"
<< "Read Center " << infix(read, readSplitLPosition, readSplitRPosition) << "\n"
<< "Read Right " << suffix(read, readSplitRPosition) << "\n";
return 0;
}
The output is as follows.
Situation REF AGCATGTTAGATAAGATAG-----------CTGTGCTAGTAGGCAGTCAGCGCCAT READ AGCATGTTAGATAAGATAGCCCCCCCCCCCCTGTGCTAGTAGGCAGTCAGCGCCAT Resulting alignments Left 0 . : . AGCATGTTAGATAAGATAG ||||||||||||||||||| AGCATGTTAGATAAGATAG Right 0 . : . : . CTGTGCTAGTAGGCAGTCAGCGCCAT |||||||||||||||||||||||||| CTGTGCTAGTAGGCAGTCAGCGCCAT refSplitPosition == 19 readSplitLPosition == 19 readSplitRPosition == 30 Reference Left AGCATGTTAGATAAGATAG Reference Right CTGTGCTAGTAGGCAGTCAGCGCCAT Read Left AGCATGTTAGATAAGATAG Read Center CCCCCCCCCCC Read Right CTGTGCTAGTAGGCAGTCAGCGCCAT