splitAlignment

Function

splitAlignment

Compute split alignments.

There are two variants of the split alignment problem. In the first variant, we wan to align two sequences where the first (say the reference) one is shorter than the second (say a read) and the read contains an insertion with respect to the reference. We now want to align the read agains the reference such that the left part of the read aligns well against the left part of the reference and the right part of the read aligns well against the right part of the reference. The center gap in the reference is free. For example:

reference  AGCATGTTAGATAAGATAGC-----------TGTGCTAGTAGGCAGTCAGCGCCAT
           ||||||||||||||||||||           |||||||||||||||||||||||||
read       AGCATGTTAGATAAGATAGCCCCCCCCCCCCTGTGCTAGTAGGCAGTCAGCGCCAT

The second variant is to align two sequences A and B against a reference such that the left part of A aligns well to the left part of the reference and the right part of B aligns well to the right part of the reference. Together, both reads span the whole reference and overlap with an insertion in the reference.

reference  AGCATGTTAGATAAGATAGCTGTGCTAGTAGGCAGTCAGCGCCAT
           |||||||||||||||||| | ||
           AGCATGTTAGATAAGATATCCGTCC
           read 1
                             ||| |||||||||||||||||||||||
                           CCGCTATGCTAGTAGGCAGTCAGCGCCAT
                                                  read 2

The resulting alignment of the left/right parts is depicted below. The square brackets indicate clipping positions.

reference  AGCATGTTAGATAAGATA    [GCTGTGCTAGTAGGCAGTCAGCGCCAT
           ||||||||||||||||||    [ | ||
           AGCATGTTAGATAAGATA    [TCCGTCC
           read 1
reference  AGCATGTTAGATAAGATA]    GTGCTAGTAGGCAGTCAGCGCCAT
                             ]     |||||||||||||||||||||||
                        CCGCT]    ATGCTAGTAGGCAGTCAGCGCCAT
                                                    read 2

In the first case, we want to find the one breakpoint in the reference and the two breakpoints in the reads and the alignment of the left and right well-aligning read parts. In the second case, we want to find the one breakpoint in the reference and the breakpoint/clipping position in each read.

The splitAlignment() function takes as the input two alignments. The sequence in each alignment's first row is the reference and the sequence of the second row is the read. The sequence has to be the same sequence whereas the reads might differ. If the reads are the same then this is the same as the first case and if the reads differ then this is the second case.

The result is two alignments of the left and right contig path clipped appropriately. The resulting score is the sum of the scores of both alignments.

TScoreValue splitAlignment(alignL, alignR, scoringScheme[, lowerDiag, upperDiag])

TScoreValue splitAlignment(gapsHL, gapsVL, gapsHR, gapsVR, scoringScheme[, lowerDiag, upperDiag])

Include Headers

seqan/align_split.h

Parameters

alignL	Align object with two rows for the left alignment. Types: Align
alignR	Align object with two rows for the right alignment. Types: Align
gapsHL	Gaps object with the horizontal/contig row for the left alignment. Types: Gaps
gapsVL	Gaps object with the vertical/read row for the left alignment. Types: Gaps
gapsHR	Gaps object with the horizontal/contig row for the right alignment. Types: Gaps
gapsVR	Gaps object with the vertical/read row for the right alignment. Types: Gaps
scoringScheme	The scoring scheme to use for the alignment. Types: Score
lowerDiag	The lower diagonal. Types: int Remarks: You have to specify the upper and lower diagonals for the left alignment. For the right alignment, the corresponding diagonals are chosen for the lower right part of the DP matrix.
upperDiag	The lower diagonal. Also see remark for lowerDiag. Types: int

Remarks

The DP algorithm is chosen automatically depending on whether the gap open and extension costs are equal.

Return Values

The sum of the alignment scores of both alignments. TScoreValue is the value type of scoringScheme.

SeqAn - Sequence Analysis Library - www.seqan.de