Class FragmentStore
Multi-container to store contigs, reads, multiple read alignments and genome annotations.

Defined in <seqan/store.h>
Signature template <[typename TSpec[, typename TConfig]]> class FragmentStore;

Template Parameters

TSpec The specialializing type. Default: void.
TConfig The configuration struct. Default: FragmentStoreConfig<TSpec>.

Member Function Overview

Interface Function Overview

Member Typedef Overview

Member Variable Overview

Detailed Description

The FragmentStore is a data structure specifically designed for read mapping, genome assembly or gene annotation. These tasks typically require lots of data structures that are related to each other like: reads, mate-pairs, reference genome; pairwise alignments; genome annotation.

The FragmentStore subsumes all these data structures in an easy to use interface. It represents a multiple alignment of millions of reads or mate-pairs against a reference genome consisting of multiple contigs. Additionally, regions of the reference genome can be annotated with features like 'gene', 'mRNA', 'exon', 'intro' or custom features. The FragmentStore supports I/O functions to read/write a read alignment in Sam or Amos format and to read/write annotations in Gff/Gtf format.

The FragmentStore can be compared with a database where each table (called "store") is implemented as a String member of the FragmentStore class. The rows of each table (implemented as structs) are referred by their ids which are their positions in the string and not stored explicitly. The only exception is the alignedReadStore whose elements of type AlignedReadStoreElement contain an id-member as they may be rearranged in arbitrary order, e.g. by increasing genomic positions or by readId. Many stores have an associated name store to store element names. Each name store is a StringSet that stores the element name at the position of its id. All stores are present in the FragmentStore and empty if unused. The concrete types, e.g. the position types or read/contig alphabet, can be easily changed by defining a custom config struct which is a template parameter of the FragmentStore class.

Examples

Load read alignments and a reference genome and display the multiple alignment in a genomic range:

#include <iostream>
#include <seqan/store.h>

using namespace seqan;

int main ()
{
    // instantiate emtpy FragmentStore and set file paths
    FragmentStore<> store;
    std::string pathGenome = std::string(SEQAN_PATH_TO_ROOT()) + "/core/demos/tutorial/store/ex1.fa";
    std::string pathSAM    = std::string(SEQAN_PATH_TO_ROOT()) + "/core/demos/tutorial/store/ex1.sam";

    // load example genome and example reads and alignments
    loadContigs(store, pathGenome.c_str());
    std::ifstream file(pathSAM.c_str());
    read(file, store, Sam());

    // compute staircase read layout and print from position 30..129
    AlignedReadLayout layout;
    layoutAlignment(layout, store);
    printAlignment(std::cout, Raw(), layout, store, 1, 30, 130, 0, 36);
    
    return 0;
}
ATTTAAGAAATTACAAAATATAGTTGAAAGCTCTAACAATAGACTAAACCAAGCAGAAGAAAGAGGTTCAGAACTTGAAGACAAGTCTCTTATGAATTAA
ATTTAA  AATTACAAAATATAGTTGAAAGCTCTAACAATAGA   AACCAAGCAGAAGAAAGAGGTTCAGAACTTGAAGA  AGTCTCTTATGAATTAA
ATTTA GAAATTACAAAATATAGTTGAAAGCTCTAACAATA ACTAAACCAAGCAGAAGAAAGAGGTTCAGAACTTG AGACAAGTCTCTTATGAATTAA
attta GAAATTACAAAATATAGTTGAAAGCTCTAACAATAG    AACCAAGCAGAAGAAAGAGGCTCAGAACTTGAAGA  AGTCTCTTATGAATTAA
ATTTAA   ATTACAAAATATAGTTGAAAGATCTAACAATAGAC    CCAAGCAGAAGAAAGAGGTTCAGAACTTGAAGACAA     TTATGAATTAA
ATTTAAGAA TTACAAAATATAGTTGAAAGCTCTAACAATAGACT     AAGCAGAAGAAAGAGGTTCAGAACTTGAAGACAAG     TATGAATTAA
ATTTAAGAAA  ACAAAATATAGTTGAAAGCTCTAACAATAGACTAA     GCAGAAGAAAGAGGTTCAGAACTTGAAGACAAGTC    ATGAATTAA
ATTTAAGAAA  ACAAAATATAGTTGAAAGCTCTAACAATAGACTAA      CAGAAGAAAGAGGTTCAGAACTTGAAGACAAGTCT    TGAATTAA
ATTTAAGAAA  ACAAAATATAGTTGAAAGCTCTAACAATAGACTAA      CAGAAGAAAGAGGTTCANANNNTGANGACAAGTCT    TGAATTAA
ATTTAAGAAATT CAAAATATAGTTGAAAGCTCTAACAATAGACTAAA       GAAGAAAGAGGTTCAGAACTTGAAGACAAGTCTCT   GAATTAA
ATTTAAGAAAT   AAAATATAGTTGAAAGCTCTAACAATAGACTAAAC       AAGAAAGAGGTTCAGAACTTGAAGACAAGTCTCGT  GAATTAA
ATTTAAGAAAT   AAAATATAGTTGAAAGCTCTAACAATAGACTAAAC       AAGAAAGAGGTTCAGAACTTGAAGACAAGTCTCTT   AATTAA
ATTTAAGAAAT    AAATATAGTTGAAAGCTCTAACAATAGACTAAACC        GAAAGAGGTTCAGAACTTGAAGACAAGTCTCTTATG
ATTTAAGAAATT   AAATATAGTTGAAAGCTCTAACAATAGACTAAACC          AAGAGGTTCAGAACTTGAAGACAAGTCTCTTATGA
ATTTAAGAAATT    AATATAGTTGAAAGCTCTAACAATAGACTAAACCAA        AAGAGGTTCAGAACTTGAAGACAAGTCTCTTATGA
ATTTAAGAAATTACA  ATATAGTTGAAAGCTCTAACAATAGACTAAACCAA          GAGGTTCAGAACTTGAAGACAAGTCTCTTATGAAT
ATTTAAGAAATTACAA   ATAGTTGAAAGCTCTAACAATAGACTAAACCAAGC        GAGGTTCAGAACTTGAAGACAAGTCTCTTATGAAT
ATTTAAGAAATTACAAAATA AGTTGAAAGCTCTAACAATAGACTAAACCAAGCAG       AGGTTCAGAACTTGAAGACAAGTCTCTTATGAATT
ATTTAAGAAATTACAAAATAT  TTGAAAGCTCTAACAATAGACTAAACCAAGCAGAA      GGTTCAGAACTTGAAGACAAGTCTCTTATGAATTA
ATTTAAGAAATTACAAAATATA   GAAAGCTCTAACAATAGACTAAACCAAGCAGAAGAAAGAG TTCAGAACTTGAAGACAAGTCTCTTATGAATTAA
ATTTAAGAAATTACAAAATATAGTTGAA    CTAACAATAGACTAAACCAAGCAGAAGAAAGAGTT      CTTGAAGACAAGTCTCTTATGAATTAA
ATTTAAGAAATTACAAAATATAGTTGAAA   CTAACAATAGACTAAACCAAGCAGAAGAAAGAGGTT      TTGAAGACAAGTCTCTTATGAATTAA
ATTTAAGAAATTACAAAATATAGTTGAAAG   TAACAATAGACTAAACCAAGCAGAAGAAAGAGGTT       TGAAGACAAGTCTCTTATGAATTAA
ATTTAAGAAATTACAAAATATAGTTGAAAGCTCT ACAATAGACTAAACCAAGCAGAAGAAAGAGGTTCA     TGAAGACAAGTCTCTTATGAATTAA
  TTAAGAAATTACAAAATATAGTTGAAAGCTCTAAC    GACTAAACCAAGCAGAAGAAAGAGGTTCAGAACTT AAGACAAGTCTCTTATGAATTAA
   TAAGAAATTACAAAATATAGTTGAAAGCTCTAACAATAGA                     GGTTCAGAACTTGAAGACAAGTCTCTTATGAATTA
          TTACAAAATATAGTTGAAAGCTCTAACAATAGACT                   GGTTCAGAACTTGAAGACAAGTCTCTTATGAATTA
                   ATAGTTGAAAGCTCTAACAATAGACTAAACCAAGC           GTTCAGAACTTGAAGACAAGTCTCTTATGAATTAA
                          AAAGCTCTAACAATAGACTAAACCAAGCAGAAGAA      TCAGAACTTGAAGACAAGTCTCTTATGAATTAA
                          AAAGCTCTAACAATAGACTAAACCAAGCAGAAGAA               NAAGACAAGTCTCTTATGAATTAA
                           AAGCTCTAACAATAGACTAAACCAAGCAGAAGAAA              GAAGACAAGTCTCTTATGAATTAA
                                 TAACAATAGACTAAACCAAGCAGAAGAAAGAGGTT               AGTCTCTTATGAATTAA
                                 TAACAATAGACTAAACCAAGCAGAAGAAAGAGGTT                GTCTCTTATGAATTAA
                                  AACAATAGACTAAACCAAGCAGAAGAAAGAGGTTC
                                  AACAATAGACTAAACCAAGCAGAAGAAAGAGGTTC
                                     AATAGACTAAACCAAGCAGAAGAAAGAGGTTCAGA
                                     AATAGACTAAACCAAGCAGAAGAAAGAGGTTCAGA

Remarks

The following figures visualize the relations between the different stores:

Member Functions Detail

void clearReads(store);

Removes all reds from a FragmentStore.

Parameters

store The FragmentStore to remove all reads from.

Interface Functions Detail

TSize appendAlignedRead(store, readId, contigId, beginPos, endPos[, pairMatchId]);

Appends an aligned read entyr to a fragment store.

Parameters

store The FragmentStore to append to.
readId The id of the read to append an alignment for.
contigId The id of the contig of the alignment.
beginPos The begin position of the alignment.
endPos The end position of the alignment.
pairMatchId The id of the alignedRead pair. Default: FragmentStore::INVALID_ID which corresponds to an unmated read.

Returns

TSize The alignedReadId of the alignment.

Remarks

This function appends a single read alignment to the alignedReadStore. Note that this really only adds a match. To generate a global alignment out of all these matches, use convertMatchesToGlobalAlignment.

See Also

TSize appendMatePair(store, readSeq1, readSeq2[, name1, name2]);

Appends the two reads of a mate pair to a FragmentStore.

Parameters

store The FragmentStore to append the mate pair to.
readSeq1 The read sequence of the first read.
readSeq2 The read sequence of the second read.
name1 The read name of the first read.
name2 The read name of the first read.

Returns

TSize The matePairId of the newly appended mate pair. TSize is the size type of the matePairStore.

Remarks

This function appends two reads to the readStore and readSeqStore and a mate pair entry for both them to the matePairStore. If names are given, they are appended to readNameStore.

TSize appendRead(store, read[, matePairId]); TSize appendRead(store, read, name[, matePairId]);

Append a read to a FragmentStore.

Parameters

store The FragmentStore to append the read to.
read The read sequence. Type: SequenceConcept.
name The name of the read. Type: CharString.
matePairId ID of the mate-pair that this read is part of. Default: FragmentStore::INVALID_ID which corresponds to an unmated read.

Returns

TSize The readId of the newly appended read. TSize is the size type of the readStore.

This funciton appends a single read to the readStore and readSeqStore.

See Also

void calculateInsertSizes(insertSizes, store);

Calcualtes a string wtih insert sizes for each pair match.

Parameters

insertSizes A String of insert sizes. This string is accordingly resized and can be addressed by the pairMatchId.
store The FragmentStore to compute the insert sizes for.

Remarks

This function calls compactPairMatchIds first and calcualte the insert size for every pair match. The insert size of a pair match is the outer distance between the two matches.

void calculateMateIndices(mateIndices, store);

Calculates a string that maps the readId of a read to the index of its mate in the alignedReadStore.

Parameters

mateIndices A String with the resulting mate indices. This string is accordingly resized and can be addressed by the readId.
store The FragmentStore.

Entries of reads without a mate contain INVALID_ID.

void clearContigs(store);

REvmoes all contigs from a FragmentStore.

Parameters

store The FragmentStore to remove all contigs from.

This function clears the contigStore and contigNameStore.

TSize compactAlignedReads(store);

Remove invalid aligned reads and rename the alignId's sequentially beginning with 0.

Parameters

store The FragmentStore to compact the aligned reads of.

Returns

s TSize The new size of the alignedReadStore. TSize is the size type of the alignedReadStore.

Remarks

This function removes all entries from the alignedReadStore whose alignId is equal to INVALID_ID as well as orphan entries in alignQualityStore. Afterwards, the alignIds are renamed sequentially beginning with 0. This function can be used to remove alignments which are flagged by previously setting their id to INVALID_ID.

TSize compactPairMatchIds(store);

Renames pairMatchId sequentially beginning with 0.

Parameters

store The FragmentStore to compact pair match ids of.

Returns

TSize The number of pair matches. TSize is the size type of alignedReadStore.

Remarks

This function renames the pairMatchId in the alignedReadStore sequentially beginning with 0. Two read alignments can be identified to be pair match if they have the same pairMatchId. Please note that paired reads not necessarily have to map as a pair match, e.g. if they are on different ocntigs or have the same orientation or a wrong insert size.

void convertMatchesToGlobalAlignment(store, score, shrinkMatches);

Converts all matches to a multiple global alignment in gap-space.

Parameters

store The fragment store. Types: FragmentStore
score A score object used by globalAlignment in this function.
shrinkMatches States whether the matches should be shrinked. Types: True, False

Remarks

Before calling this function all gaps structures in alignedReadStore and contigStore must be empty, i.e. there are no gaps in the alignments. This function iterates over entries in the alignedReadStore and semi-global aligns each read to its contig segments given by begin and end position. Gaps introduced by these pair-wise alignments are then inserted to the affected contig and reads correspondingly.

The invariant that positions in the alignedReadStore are in gap-space holds before (there were no gaps in alignments) and after calling this functions.

If the alignQualityStore of the FragmentStore is empty when convertMatchesToGlobalAlignment() is called then the alignQualityStore is filled with the edit distance scores.

void convertPairWiseToGlobalAlignment(store, pairwiseContigGaps);

Converts pairwise alignments to a multiple global alignment.

Parameters

store The fragment store. Types: FragmentStore
pairwiseContigGaps A String of anchored contig gaps for every pairwise alignment.

Remarks

Before calling this function the gaps structures in the contigStore must be empty, i.e. there are no gaps in the contig. The pairwise alignment gaps of the reads are stored in the gaps structure in the alignedReadStore, whereas the pairwise alignment gaps of the contig are stored in the pairwiseContigGaps string.

After calling this functions all positions in the alignedReadStore are in gap-space.

void getClrRange(store, alignEl, begClr, endClr);

Get the "clear" range of a read alignment.

Parameters

store The FragmentStore to work on.
alignEl The AlignedReadStoreElement to work on.
begClr Begin of the clear range.
endClr End of the clear range.

The clear range of a read alignment is the range of the part of the alignmetn that is not clipped.

int getMateNo(store, readId);

Returns the mate number for a read given a readId.

Parameters

store The FragmentStore with the read.
readId The readId.

Returns

int The mate number (0 for the first mate, 1 for the second mate) of the read in its mate pair or -1 if the read is not paired.

TRead getRead(store, id);

Returns the read with the given readId.

Parameters

store The FragmentStore to query for the read.
id The id of the read.

Returns

TRead The entry from the readStore. TRead is the value type of the readStore.

bool loadContig(store, contigId);

Manually load a contig sequence.

Parameters

store The FragmentStore to load the contig for.
contigId The id of the contig that was created earlier by loadContigs.

Returns

bool true on success, false on failure.

bool loadContigs(store, fileName[, loadSeqs]); bool loadContigs(store, fileNameList[, loadSeqs]);

Load contigs into a FragmentStore.

Parameters

store The FragmentStore to append the contigs to.
fileName A CharString with the name of the file to load.
fileNameList A StringSet of CharString with a list of file names to load.
loadSeqs A bool indicating whether to load lazily. If true then sequences are loaded immediately. If false, an emptycontig with a reference to the file is created. Its sequence can be loaded on demand by lockContig and loadContig.

Returns

bool true in case of success and false in case of error.

bool loadReads(store, fileName); bool loadReads(store, fileNameL, fileNameR);

Loads reads into FragmentStore

Parameters

store The FragmentStore to append the reads to.
fileName Path to single-end read file.
fileNameL Path to left read file in case of paired reads.
fileNameR Path to right read file in case of paired reads.

Returns

bool true in case of success, false in case of errors.

When two file names are given thent he files are expected to containt he same number of reads and reads with the same index are assumed to be mate pairs. Mate pairs are stored internally in an "interleaved mode": a read is read from each file before reading the next one.

bool lockContig(store, contigId);

Locks a contig sequence from being removed.

Parameters

store The FragmentStore to lock the contig for.
contigId The id of the contig that was created earlier by loadContigs.

Returns

bool true on success, false on failure.

This function increases the contig usage counter by 1 and ensures that the contig sequence is loaded.

bool lockContigs(store);

Locks all contig sequences from being remove.

Parameters

store The FragmentStore to lock the contigs for.

Returns

bool true in case of success, false in case of errors.

int read(file, store, tag);

Read the contents of a FragmentStore from a file.

Parameters

file The StreamConcept to read from.
store The FragmentStore to append to.
tag The format to read from. Can be Amos or Sam.

Returns

int 0 in the case of success, non-0 value in case of errors.

bool unlockContig(store, contigId);

Removes a previous contig lock and clears the sequence if no further lock exists.

Parameters

store The FragmentStore to unlock the contig for.
contigId The id of the contig that was created earlier by loadContigs.

Returns

bool true on success, false on failure.

This function decreases the contig usage counter by 1 and frees the sequences' memory if the counter equals 0.

bool unlockAndFreeContigs(store);

Unlocks all contig sequences and clears sequences without lock.

Parameters

store The FragmentStore to unlock the contigs for.

Returns

bool true in case of success, false in case of errors.

bool unlockContig(store, contigId);

Removes a previous contig lock.

Parameters

store The FragmentStore to unlock the contig for.
contigId The id of the contig that was created earlier by loadContigs.

Returns

bool true on success, false on failure.

This function decreases the contig usage counter by 1.

bool unlockContigs(store);

Unlocks all contig sequences.

Parameters

store The FragmentStore to unlock the contigs for.

Returns

bool true in case of success, false in case of errors.

int write(file, store, tag);

Write the contents of a FragmentStore to a file.

Parameters

file The StreamConcept to write to.
store The FragmentStore to write to the file.
tag The format to write out. Types: Sam or Amos.

Returns

int 0 in case of success, 1 in case of errors.

bool writeContigs(file, store, tag);

Write contigs from FragmentStore into a StreamConcept.

Parameters

file The StreamConcept to write to.
store The FragmentStore to write contigs of.
tag A tag for the sequence format.

Returns

bool true on success, false on errors.

Member Typedef Detail

typedef (..) TFragmentStore::TAlignedReadStore;

Type of the alignedReadStore member.

typedef (..) TFragmentStore::TAlignedReadTagStore;

Type of the alignedReadTagStore member.

typedef (..) TFragmentStore::TAlignQualityStore;

Type of the alignQualityStore member.

typedef (..) TFragmentStore::TAnnotationKeyStore;

Type of the annotationKeyStore member.

typedef (..) TFragmentStore::TAnnotationNameStore;

Type of the annotationNameStore member.

typedef (..) TFragmentStore::TAnnotationStore;

Type of the annotationStore member.

typedef (..) TFragmentStore::TAnnotationTypeStore;

Type of the annotationTypeStore member.

typedef (..) TFragmentStore::TContigFileStore;

Type of the contigFileStore member.

typedef (..) TFragmentStore::TContigNameStore;

Type of the contigNameStore member.

typedef (..) TFragmentStore::TContigStore;

Type of the contigStore member.

typedef (..) TFragmentStore::TLibraryNameStore;

Type of the libraryNameStore member.

typedef (..) TFragmentStore::TMatePairNameStore;

Type of the matePairNameStore member.

typedef (..) TFragmentStore::TMatePairStore;

Type of the matePairStore member.

typedef (..) TFragmentStore::TReadNameStore;

Type of the readNameStore member.

typedef (..) TFragmentStore::TReadSeqStore;

Type of the readSeqStore member.

typedef (..) TFragmentStore::TReadStore;

Type of the readStore member.

Member Variables Detail

FragmentStore::TAlignedReadStore FragmentStore::alignedReadStore

String that stores (alignId, readId, contigId, pairMatchId, beginPos, endPos, gapAnchors).

The value type is AlignedReadStoreElement.

Remarks

You can sort alignedReadStore using sortAlignedReads. After sorting, you can use the functions lowerBoundAlignedReads and upperBoundAlignedReads to perform a binary search, e.g. for accessing only a subrange.

FragmentStore::TAlignedReadTagStore FragmentStore::alignedReadTagStore

StringSet that maps from alignId to alignTag.

FragmentStore::TAlignQualityStore FragmentStore::alignQualityStore

String that maps from alignId to (pairScore, score, errors).

The value type is AlignQualityStoreElement.

FragmentStore::TAnnotationKeyStore FragmentStore::annotationKeyStore

StringSet that maps from keyId to the name of a key. The keyId is used to address values of an annotation.

FragmentStore::TAnnotationNameStore FragmentStore::annotationNameStore

StringSet that maps from annoId to annoName;

FragmentStore::TAnnotationStore FragmentStore::annotationStore

String that maps from annoId to (contigId, typeId, beginPos, endPos, parentId, lastChildId, nextSiblingId, values).

The value type is AnnotationStoreElement.

Instead of accessing this store directly, consider to use the high-level interface provided by AnnotationTreeIterator.

FragmentStore::TAnnotationTypeStore FragmentStore::annotationTypeStore

StringSet that maps from typeId to the type name of an annotation, e.g. "gene" or "exon". typeId is a member of the AnnotationStoreElement.

Remarks

There are predefined type ids for commonly used types e.g. ANNO_GENE or ANNO_EXON which can be used to set the typeId directly as a fast alternative to getType and setType.

FragmentStore::TContigFileStore FragmentStore::contigFileStore

String that maps from contigId to (contigSeq, contigGaps, contigFileId).

Value type is ContigFile.

FragmentStore::TContigNameStore FragmentStore::contigNameStore

StringSet that maps from contigId to contigName.

FragmentStore::TContigStore FragmentStore::contigStore

String that maps from contigId to (contigSeq, contigGaps, contigFileId).

Value type is ContigStoreElement.

FragmentStore::TLibraryNameStore FragmentStore::libraryNameStore

A StringSet that maps from libId to libName.

FragmentStore::TLibraryStore FragmentStore::libraryStore

String that maps from libId to (mean, std).

Value type is LibraryStoreElement.

FragmentStore::TContigNameStore FragmentStore::matePairNameStore

StringSet that maps from contigId to matePairName.

FragmentStore::TMatePairStore FragmentStore::matePairStore

String that maps from matePairId to (readId[2], libId).

The value type is MatePairStoreElement.

FragmentStore::TReadNameStore FragmentStore::readNameStore

StringSet that maps from readId to readName.

FragmentStore::TReadSeqStore FragmentStore::readSeqStore

StringSet that maps from readId to readSeq.

FragmentStore::TReadStore FragmentStore::readStore

A String that maps from readId to matePairId.

The value type is ReadStoreElement.