Example Program
Index countOccurrencesMultiple
Example for using the functions countOccurrencesMultiple and countSequences for q-gram indices.
This example shows how to create a q-gram index for a string set and subsequently how to count the number of occurrences of a pattern in all strings of the stringset. Before this we output the number of sequences in the index. Finally we output the frequencies of shared q-grams between pairs of sequences in the string set.
A tutorial about the counts fibre of the q-gram index.
 1 #include  2 #include  3 4 using namespace seqan; 5 6 int main () 7 {
First, we create a StringSet of 4 Strings.
 8 StringSet< String > mySet; 9 resize(mySet, 4); 10 mySet[0] = "tobeornottobe"; 11 mySet[1] = "thebeeonthecomb"; 12 mySet[2] = "hellobebe"; 13 mySet[3] = "beingjohnmalkovich"; 14
Then we create an Index of our StringSet and a Finder of the Index.
 15 typedef Index< StringSet >, IndexQGram > > TIndex; 16 typedef Infix::Type const>::Type TCounts; 17 18 TIndex myIndex(mySet); 19
Now we output how often "be" occurs in each sequence.
 20 std::cout << "Number of sequences: " << countSequences(myIndex) << std::endl; 21 hash(indexShape(myIndex), "be"); 22 TCounts cnts = countOccurrencesMultiple(myIndex, indexShape(myIndex)); 23 for (unsigned i = 0; i < length(cnts); ++i) 24 std::cout << cnts[i].i2 << " occurrences in sequence " << cnts[i].i1  << std::endl; 25
Remember that we constructed the q-gram index with ungapped 2-grams. The following function computes the fraction of common 2-grams between all pairs of sequences and outputs them.
 26 27 String distMat; 28 getKmerSimilarityMatrix(myIndex,distMat); 29 30 for( unsigned i=0; i < length(distMat); ++i) 31 std::cout << distMat[i] << " "; 32 std::cout << std::endl; 33 34 return 0; 35 }
Output
Number of sequences: 4
2 occurrences in sequence 0
1 occurrences in sequence 1
1 occurrences in sequence 3
SeqAn - Sequence Analysis Library - www.seqan.de

Page built @2013/07/11 09:12:35