Index countOccurrencesMultiple

Example Program

Example for using the functions countOccurrencesMultiple and countSequences for q-gram indices.

This example shows how to create a q-gram index for a string set and subsequently how to count the number of occurrences of a pattern in all strings of the stringset. Before this we output the number of sequences in the index. Finally we output the frequencies of shared q-grams between pairs of sequences in the string set.

File "index_qgram_counts.cpp"

A tutorial about the counts fibre of the q-gram index.

1	#include <iostream>
2	#include <seqan/index.h>
3
4	using namespace seqan;
5
6	int main ()
7	{

First, we create a StringSet of 4 Strings.

8	StringSet< String<char> > mySet;
9	resize(mySet, 4);
10	mySet[0] = "tobeornottobe";
11	mySet[1] = "thebeeonthecomb";
12	mySet[2] = "hellobebe";
13	mySet[3] = "beingjohnmalkovich";
14

Then we create an Index of our StringSet and a Finder of the Index.

15	typedef Index< StringSet<String<char> >, IndexQGram<UngappedShape<2> > > TIndex;
16	typedef Infix<Fibre<TIndex, QGramCounts>::Type const>::Type TCounts;
17
18	TIndex myIndex(mySet);
19

Now we output how often "be" occurs in each sequence.

20	std::cout << "Number of sequences: " << countSequences(myIndex) << std::endl;
21	hash(indexShape(myIndex), "be");
22	TCounts cnts = countOccurrencesMultiple(myIndex, indexShape(myIndex));
23	for (unsigned i = 0; i < length(cnts); ++i)
24	std::cout << cnts[i].i2 << " occurrences in sequence " << cnts[i].i1 << std::endl;
25

Remember that we constructed the q-gram index with ungapped 2-grams. The following function computes the fraction of common 2-grams between all pairs of sequences and outputs them.

26
27	String<double> distMat;
28	getKmerSimilarityMatrix(myIndex,distMat);
29
30	for( unsigned i=0; i < length(distMat); ++i)
31	std::cout << distMat[i] << " ";
32	std::cout << std::endl;
33
34	return 0;
35	}

Output

Number of sequences: 4
2 occurrences in sequence 0
1 occurrences in sequence 1
1 occurrences in sequence 3

See

Index, countSequences, getKmerSimilarityMatrix, countOccurrencesMultiple

SeqAn - Sequence Analysis Library - www.seqan.de