fn() consensusAlignment
Compute consensus alignment.

Defined in <seqan/consensus.h>
Signature void consensusAlignment(store, options);

Parameters

store FragmentStore to use for consensus alignment computation.
options ConsensusAlignmentOptions with configuration.

Thrown Exceptions

s ConsensusAlignerIllegalArgumentException in case of invalid arguments (e.g. two alignments for the same read).

Detailed Description

This function computes a consensus alignment for a set of nucleic sequences that are stored in a FragmentStore. Often, consensus sequences are reads, but they could also be other sequences, such as RNA transcripts. However, in the following description we call them reads.

This function uses the contigStore, alignedReadStore, and readSeqStore members of store.

Each read must have at most one entry in store.alignedReadStore.

General Algorithm

In the most common case, both contig ID information and position information is available. The algorithm considers all aligned reads on each contig. For each read, all overlapping reads (with begin/end position extended options.posDelta to the left/to the right) are considered and overlap alignments are computed. This pairwise alignment information is then used to compute a multiple sequence alignment (MSA).

The resulting MSA is then refined by reAlignment (see runRealignment).

Using position information

When position information is to be used then this will be used to generate fewer overlap alignments by only considering possible overlaps in windows around each read alignment as described above. Note that there can only be at most one alignment for each read in the store.alignedReadStore and the end position must be greater than or equal to the begin position, i.e., the alignment must be on the forward strand.

Using position information also requires contig ID information.

Using contigID information

When contig ID information is to be used and position information is not to be used then the consensusAlignment() will compute pairwise alignments between all pairs of reads on the same contig.

When contigID information is to be used then only the reads with an entry in store.alignedReadStore are considered.

When contigID information is not used then an all-to-all pairwise alignment of all reads will be performed.

Note that if reads having the same contig ID cannot be properly aligned and the MSA falls apart then the reads for one "connected alignment component" are kept on the original contig while the rest are added to new contigs that are appended to store.contigStore.

Example

The following example program takes a reference sequence and creates overlapping reads from it. These are then added to store with approximate positions (adding and subtracting a few positions). The function consensusAlignment() is then used to compute a MSA and the consensus sequence is stored in store.contigStore[0].seq.

#include <iostream>

#include <seqan/consensus.h>
#include <seqan/sequence.h>
#include <seqan/store.h>

using namespace seqan;

int main()
{
    // Reference to simulate reads from.
    Dna5String ref =
        "AATGGATGGCAAAATAGTTGTTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTTAGTCAAATTTT"
        "CAATACTGTACAATCTTCTCTAGAGCAGAGCAAAAGAATAAAAGCACTTCTAGCTAATATTATGTGGCAT";

    // Read length and step width for reads.
    int const READ_LENGTH = 50;
    int const STEP = 5;

    // Compute reads and append to FragmentStore.
    FragmentStore<> store;
    for (unsigned pos = 0, i = 0; pos + READ_LENGTH < length(ref); pos += STEP, ++i)
    {
        // Append a new read sequence.
        unsigned readID = appendRead(store, infix(ref, pos, pos + READ_LENGTH));
        // Create small perturbation of the position but not left of position 0.
        int pos2 = std::max(0, (int)pos + ((int)i % 6 - 3));
        // Append a new read alignment for the just added read.
        appendAlignedRead(store, readID, 0, pos2, pos2 + READ_LENGTH);
    }

    // Add contig and contig name for printing.
    resize(store.contigStore, 1);
    store.contigStore[0].seq = ref;
    resize(store.contigNameStore, 1);

    // Print initial perturbated alignment.
    std::cout << "Initial Alignment\n\n";
    AlignedReadLayout layout;
    layoutAlignment(layout, store);
    printAlignment(std::cout, layout, store, /*contigID=*/ 0,
                   /*beginPos=*/ 0, /*endPos=*/ (int)length(ref), 0, 30);

    // Compute consensus alignment.
    ConsensusAlignmentOptions options;
    consensusAlignment(store, options);

    // Print final consensus alignment.
    std::cout << "\n\nFinal Consensus Alignment\n\n";
    layoutAlignment(layout, store);
    printAlignment(std::cout, layout, store, /*contigID=*/ 0,
                   /*beginPos=*/ 0, /*endPos=*/ (int)length(ref), 0, 30);

    return 0;
}

The output is as follows:

Initial Alignment

AATGGATGGCAAAATAGTTGTTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGTACAATCTTCTCTAGAGCAGAGCAAAAGAATAAAAGCACTTCTAGCTAATATTATGTGGCAT
.................................................. TGCT.A..TAGTC..A...TC.ATACTGTAC.ATC.TCTCTAGAGCAGAG
   ATGGCAA..T.GTTG.TCCATGAAT.C.TC...AA.G..CTT.GATGCTA    A.TTAGTC.AA..TTCA.TAC.GT.CAA.CT...CT..AGC..AGC..A.
         A...TAGT.GT.C.ATGA.TACATCTCTA..GAGCT..GATGCTA.T..A    .TT..CAAT.CTG.AC.ATC.TC...AG..CAG..CAA..G.AT.A..GC
               ..................................................    CA.TACTGTACA.TCT.CTCTAGAGCAGAGCA...GA.TA...GCACT.C
                     .T.CATG.ATACATCTCT..AGAGC..TGATGCT.A..TAGTC..A...T    ..................................................
                           TGA.T.CA...CT.A..AGC.TTGATGCTAA.TTAGTC.AA..TTCA.TA    C.ATC.TCTCTAGAGCAGAGC...AG.AT...AGCAC.TCTAGCT.ATA.
                           .CAT.TCTAAAG.GCTTTGA..C.AA.TTAG.CAAATTT.CAAT.C.GTA          T...CT..AGC..AGC..A.GA.T..AAG.ACT.CTAGCTA..AT.A..T
                                 ...AA.G..CTT.GATGCTAATT.AGTCAA.TT..CAAT.CTG.AC.ATC
                                       .GAGCT..GATGCTA.T..AGTCA..T...CA.TACTGTACA.TCT.CTC
                                             ..................................................
                                                         G.CAAATTT.CAAT.C.GTACA.TCT...CTAGAGC..AGC.AA.G..T.


Final Consensus Alignment

AATGGATGGCAAAATAGTTGTTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGTACAATCTTCTCTAGAGCAGAGCAAAAGAATAAAAGCACTTCTAGCTAATATTATGT-----
..................................................     ..................................................
     ..................................................     ..................................................
          ..................................................     ..................................................
               ..................................................     ..................................................
                    ..................................................     ..................................................
                         ..................................................     ..................................................
                              ..................................................     ..................................................
                                   ..................................................
                                        ..................................................
                                             ..................................................
                                                  ..................................................