Class
FragmentStore
Multi container to store contigs, reads, multiple read alignments and genome annotations.
The FragmentStore is a data structure specifically designed for read mapping, genome assembly or gene annotation. These tasks typically require lots of data structures that are related to each other like: reads, mate-pairs, reference genome; pairwise alignments; genome annotation.
The FragmentStore subsumes all these data structures in an easy to use interface. It represents a multiple alignment of millions of reads or mate-pairs against a reference genome consisting of multiple contigs. Additionally, regions of the reference genome can be annotated with features like 'gene', 'mRNA', 'exon', 'intro' or custom features. The FragmentStore supports I/O functions to read/write a read alignment in Sam or Amos format and to read/write annotations in Gff/Gtf format.
The FragmentStore can be compared with a database where each table (called "store") is implemented as a String member of the FragmentStore class. The rows of each table (implemented as structs) are referred by their ids which are their positions in the string and not stored explicitly. The only exception is the alignedReadStore whose elements of type AlignedReadStoreElement contain an id-member as they may be rearranged in arbitrary order, e.g. by increasing genomic positions or by readId. Many stores have an associated name store to store element names. Each name store is a StringSet that stores the element name at the position of its id. All stores are present in the FragmentStore and empty if unused. The concrete types, e.g. the position types or read/contig alphabet, can be easily changed by defining a custom config struct which is a template parameter of the FragmentStore class.
FragmentStore<>
FragmentStore<TSpec[, TConfig]>
Include Headers
seqan/store.h
Parameters
TSpec
The specializing type.
Default: void
TConfig
The configuration struct.
Default: FragmentStoreConfig<TSpec>
Data Members
alignedReadStoreString that stores <alignId, readId, contigId, pairMatchId, beginPos, endPos, gaps>.
alignedReadTagStoreStringSet that maps from alignId to alignTag.
alignQualityStoreString that maps from alignId to <pairScore, score, errors>.
annotationKeyStoreStringSet that maps from keyId to the name of a key. The keyId is used to address values of an annotation.
annotationNameStoreStringSet that maps from annoId to annoName.
annotationStoreString that maps from annoId to <contigId, typeId, beginPos, endPos, parentId, lastChildId, nextSiblingId, values>.
annotationTypeStoreStringSet that maps from typeId to the type name of an annotation, e.g. "gene" or "exon". typeId is a member of the AnnotationStoreElement.
contigFileStoreString that maps from contigFileId to <fileName, firstContigId>.
contigNameStoreStringSet that maps from contigId to contigName.
contigStoreString that maps from contigId to <contigSeq, contigGaps, contigFileId>.
libraryNameStoreStringSet that maps from libId to libName.
libraryStoreString that maps from libId to <mean, std>.
matePairNameStoreStringSet that maps from contigId to contigName.
matePairStoreString that maps from matePairId to <readId[2], libId>.
readNameStoreStringSet that maps from readId to readName.
readSeqStoreStringSet that maps from readId to readSeq.
readStoreString that maps from readId to <matePairId>.
Functions
appendAlignedReadAppends an aligned read entry to a fragment store.
appendMatePairAppends two paired-end reads to a fragment store.
appendReadAppends a read to a fragment store.
beginThe begin of a container.
calculateInsertSizesCalculates a string with insert sizes for each pair match.
calculateMateIndicesCalculates a string that maps the readId of a read to the readId of its mate.
clearContigsRemoves all contigs from a fragment store.
clearReadsRemoves all reads from a fragment store.
compactAlignedReadsRemoves invalid aligned reads and rename alignId sequentially beginning with 0.
compactPairMatchIdsRenames pairMatchId sequentially beginning with 0.
convertMatchesToGlobalAlignmentConverts all matches to a multiple global alignment in gap-space.
convertPairWiseToGlobalAlignmentConverts pair-wise alignments to a multiple global alignment.
endThe end of a container.
getClrRangeGet the "clear" range of a read alignment.
getMateNoReturns the mate number of read for a given readId.
getReadReturns the read with the given readId.
layoutAlignmentCalculates a visible layout of aligned reads.
loadContigManually loads a contig sequence.
loadContigsLoads contigs into fragment store.
loadReadsLoads reads into fragment store.
lockContigLocks a contig sequence from being removed.
lockContigsLocks all contig sequences from being removed.
printAlignmentPrints a window of the visible layout of reads into a outstream.
readLoads records from a file.
unlockAndFreeContigRemoves a previous contig lock and clears sequence no further lock exist.
unlockAndFreeContigsRemoves a previous lock for all contigs and clears sequences without lock.
unlockContigRemoves a previous contig lock.
unlockContigsRemoves a previous lock for all contigs.
writeSaves records to a file.
writeContigsWrite contigs from fragment store into file.
Examples
Load read alignments and a reference genome and display the multiple alignment in a genomic range:
1#include <iostream>
2#include <seqan/store.h>
3
4using namespace seqan;
5
6int main ()
7{
8    // instantiate emtpy FragmentStore and set file paths
9    FragmentStore<> store;
10    std::string pathGenome = std::string(SEQAN_PATH_TO_ROOT()) + "/core/demos/tutorial/store/ex1.fa";
11    std::string pathSAM    = std::string(SEQAN_PATH_TO_ROOT()) + "/core/demos/tutorial/store/ex1.sam";
12
13    // load example genome and example reads and alignments
14    loadContigs(store, pathGenome.c_str());
15    std::ifstream file(pathSAM.c_str());
16    read(file, store, Sam());
17
18    // compute staircase read layout and print from position 30..129
19    AlignedReadLayout layout;
20    layoutAlignment(layout, store);
21    printAlignment(std::cout, Raw(), layout, store, 1, 30, 130, 0, 36);
22    
23    return 0;
24}
The staircase alignment looks as follows:
ATTTAAGAAATTACAAAATATAGTTGAAAGCTCTAACAATAGACTAAACCAAGCAGAAGAAAGAGGTTCAGAACTTGAAGACAAGTCTCTTATGAATTAA
ATTTAA  AATTACAAAATATAGTTGAAAGCTCTAACAATAGA   AACCAAGCAGAAGAAAGAGGTTCAGAACTTGAAGA  AGTCTCTTATGAATTAA
ATTTA GAAATTACAAAATATAGTTGAAAGCTCTAACAATA ACTAAACCAAGCAGAAGAAAGAGGTTCAGAACTTG AGACAAGTCTCTTATGAATTAA
attta GAAATTACAAAATATAGTTGAAAGCTCTAACAATAG    AACCAAGCAGAAGAAAGAGGCTCAGAACTTGAAGA  AGTCTCTTATGAATTAA
ATTTAA   ATTACAAAATATAGTTGAAAGATCTAACAATAGAC    CCAAGCAGAAGAAAGAGGTTCAGAACTTGAAGACAA     TTATGAATTAA
ATTTAAGAA TTACAAAATATAGTTGAAAGCTCTAACAATAGACT     AAGCAGAAGAAAGAGGTTCAGAACTTGAAGACAAG     TATGAATTAA
ATTTAAGAAA  ACAAAATATAGTTGAAAGCTCTAACAATAGACTAA     GCAGAAGAAAGAGGTTCAGAACTTGAAGACAAGTC    ATGAATTAA
ATTTAAGAAA  ACAAAATATAGTTGAAAGCTCTAACAATAGACTAA      CAGAAGAAAGAGGTTCAGAACTTGAAGACAAGTCT    TGAATTAA
ATTTAAGAAA  ACAAAATATAGTTGAAAGCTCTAACAATAGACTAA      CAGAAGAAAGAGGTTCANANNNTGANGACAAGTCT    TGAATTAA
ATTTAAGAAATT CAAAATATAGTTGAAAGCTCTAACAATAGACTAAA       GAAGAAAGAGGTTCAGAACTTGAAGACAAGTCTCT   GAATTAA
ATTTAAGAAAT   AAAATATAGTTGAAAGCTCTAACAATAGACTAAAC       AAGAAAGAGGTTCAGAACTTGAAGACAAGTCTCGT  GAATTAA
ATTTAAGAAAT   AAAATATAGTTGAAAGCTCTAACAATAGACTAAAC       AAGAAAGAGGTTCAGAACTTGAAGACAAGTCTCTT   AATTAA
ATTTAAGAAAT    AAATATAGTTGAAAGCTCTAACAATAGACTAAACC        GAAAGAGGTTCAGAACTTGAAGACAAGTCTCTTATG
ATTTAAGAAATT   AAATATAGTTGAAAGCTCTAACAATAGACTAAACC          AAGAGGTTCAGAACTTGAAGACAAGTCTCTTATGA
ATTTAAGAAATT    AATATAGTTGAAAGCTCTAACAATAGACTAAACCAA        AAGAGGTTCAGAACTTGAAGACAAGTCTCTTATGA
ATTTAAGAAATTACA  ATATAGTTGAAAGCTCTAACAATAGACTAAACCAA          GAGGTTCAGAACTTGAAGACAAGTCTCTTATGAAT
ATTTAAGAAATTACAA   ATAGTTGAAAGCTCTAACAATAGACTAAACCAAGC        GAGGTTCAGAACTTGAAGACAAGTCTCTTATGAAT
ATTTAAGAAATTACAAAATA AGTTGAAAGCTCTAACAATAGACTAAACCAAGCAG       AGGTTCAGAACTTGAAGACAAGTCTCTTATGAATT
ATTTAAGAAATTACAAAATAT  TTGAAAGCTCTAACAATAGACTAAACCAAGCAGAA      GGTTCAGAACTTGAAGACAAGTCTCTTATGAATTA
ATTTAAGAAATTACAAAATATA   GAAAGCTCTAACAATAGACTAAACCAAGCAGAAGAAAGAG TTCAGAACTTGAAGACAAGTCTCTTATGAATTAA
ATTTAAGAAATTACAAAATATAGTTGAA    CTAACAATAGACTAAACCAAGCAGAAGAAAGAGTT      CTTGAAGACAAGTCTCTTATGAATTAA
ATTTAAGAAATTACAAAATATAGTTGAAA   CTAACAATAGACTAAACCAAGCAGAAGAAAGAGGTT      TTGAAGACAAGTCTCTTATGAATTAA
ATTTAAGAAATTACAAAATATAGTTGAAAG   TAACAATAGACTAAACCAAGCAGAAGAAAGAGGTT       TGAAGACAAGTCTCTTATGAATTAA
ATTTAAGAAATTACAAAATATAGTTGAAAGCTCT ACAATAGACTAAACCAAGCAGAAGAAAGAGGTTCA     TGAAGACAAGTCTCTTATGAATTAA
  TTAAGAAATTACAAAATATAGTTGAAAGCTCTAAC    GACTAAACCAAGCAGAAGAAAGAGGTTCAGAACTT AAGACAAGTCTCTTATGAATTAA
   TAAGAAATTACAAAATATAGTTGAAAGCTCTAACAATAGA                     GGTTCAGAACTTGAAGACAAGTCTCTTATGAATTA
          TTACAAAATATAGTTGAAAGCTCTAACAATAGACT                   GGTTCAGAACTTGAAGACAAGTCTCTTATGAATTA
                   ATAGTTGAAAGCTCTAACAATAGACTAAACCAAGC           GTTCAGAACTTGAAGACAAGTCTCTTATGAATTAA
                          AAAGCTCTAACAATAGACTAAACCAAGCAGAAGAA      TCAGAACTTGAAGACAAGTCTCTTATGAATTAA
                          AAAGCTCTAACAATAGACTAAACCAAGCAGAAGAA               NAAGACAAGTCTCTTATGAATTAA
                           AAGCTCTAACAATAGACTAAACCAAGCAGAAGAAA              GAAGACAAGTCTCTTATGAATTAA
                                 TAACAATAGACTAAACCAAGCAGAAGAAAGAGGTT               AGTCTCTTATGAATTAA
                                 TAACAATAGACTAAACCAAGCAGAAGAAAGAGGTT                GTCTCTTATGAATTAA
                                  AACAATAGACTAAACCAAGCAGAAGAAAGAGGTTC
                                  AACAATAGACTAAACCAAGCAGAAGAAAGAGGTTC
                                     AATAGACTAAACCAAGCAGAAGAAAGAGGTTCAGA
                                     AATAGACTAAACCAAGCAGAAGAAAGAGGTTCAGA
The following figures visualize the relations between the different stores:
Stores that are involved in the representation of a multiple read alignment.
Stores that are involved in the representation of a genome alignment.
SeqAn - Sequence Analysis Library - www.seqan.de
 

Page built @2013/07/11 09:12:34