Example Program
Index countOccurrencesMultiple
Example for using the functions countOccurrencesMultiple and countSequences for q-gram indices.
This example shows how to create a q-gram index for a string set and subsequently how to count the number of occurrences of a pattern in all strings of the stringset. Before this we output the number of sequences in the index. Finally we output the frequencies of shared q-grams between pairs of sequences in the string set.
A tutorial about the counts fibre of the q-gram index.
1#include <iostream>
2#include <seqan/index.h>
3
4using namespace seqan;
5
6int main ()
7{
First, we create a StringSet of 4 Strings.
8    StringSet< String<char> > mySet;
9    resize(mySet, 4);
10    mySet[0] = "tobeornottobe";
11    mySet[1] = "thebeeonthecomb";
12    mySet[2] = "hellobebe";
13    mySet[3] = "beingjohnmalkovich";
14
Then we create an Index of our StringSet and a Finder of the Index.
15    typedef Index< StringSet<String<char> >, IndexQGram<UngappedShape<2> > > TIndex;
16    typedef Infix<Fibre<TIndex, QGramCounts>::Type const>::Type TCounts;
17
18    TIndex myIndex(mySet);
19
Now we output how often "be" occurs in each sequence.
20    std::cout << "Number of sequences: " << countSequences(myIndex) << std::endl;  
21    hash(indexShape(myIndex), "be");
22    TCounts cnts = countOccurrencesMultiple(myIndex, indexShape(myIndex));
23    for (unsigned i = 0; i < length(cnts); ++i)
24       std::cout << cnts[i].i2 << " occurrences in sequence " << cnts[i].i1  << std::endl;
25
Remember that we constructed the q-gram index with ungapped 2-grams. The following function computes the fraction of common 2-grams between all pairs of sequences and outputs them.
26    
27    String<double> distMat;
28    getKmerSimilarityMatrix(myIndex,distMat);
29    
30    for( unsigned i=0; i < length(distMat); ++i)
31        std::cout << distMat[i] << " ";
32    std::cout << std::endl;
33    
34    return 0;
35}
Output
Number of sequences: 4
2 occurrences in sequence 0
1 occurrences in sequence 1
1 occurrences in sequence 3
SeqAn - Sequence Analysis Library - www.seqan.de
 

Page built @2013/07/11 09:12:35