Example Program
Maximal Unique Matches
Example for using the Mums Iterator.
Given a set of sequences, a unique match is a match that occurs exactly once in each sequence. A maximal unique match (MUM) is a unique match that is not part of any longer unique match. The following example demonstrates how to iterate over all MUMs and output them.
A tutorial about finding Mums.
1#include <iostream>
2#include <seqan/index.h>
3
4using namespace seqan;
5
6int main ()
7{
We begin with a StringSet that stores multiple strings.
8    StringSet< String<char> > mySet;
9    resize(mySet, 3);
10    mySet[0] = "SeqAn is a library for sequence analysis.";
11    mySet[1] = "The String class is the fundamental sequence type in SeqAn.";
12    mySet[2] = "Subsequences can be handled with SeqAn's Segment class.";
13
Then we create an Index of this StringSet.
14    typedef Index< StringSet<String<char> > > TMyIndex;
15    TMyIndex myIndex(mySet);
16
To find maximal unique matches (Mums), we use the Mums Iterator and set the minimum MUM length to 3.
17    Iterator< TMyIndex, Mums >::Type myMUMiterator(myIndex, 3);
18    String< SAValue<TMyIndex>::Type > occs;
19
20    while (!atEnd(myMUMiterator)) {
A multiple match can be represented by the positions it occurs at in every sequence and its length. getOccurrences returns an unordered sequence of pairs (seqNo,seqOfs) the match occurs at.
21        occs = getOccurrences(myMUMiterator);
To order them ascending according seqNo we use orderOccurrences.
22        orderOccurrences(occs);
23        
24        for(unsigned i = 0; i < length(occs); ++i)
25            std::cout << getValueI2(occs[i]) << ", ";
26
repLength returns the length of the match.
27        std::cout << repLength(myMUMiterator) << "   ";
28
The match string itself can be determined with representative.
29        std::cout << "\t\"" << representative(myMUMiterator) << '\"' << std::endl;
30
31        ++myMUMiterator;
32    }
33
34    return 0;
35}
Output
The only maximal matches that occur in all 3 sequences are "SeqAn" and "sequence". They occur exactly once and thus are maximal unique matches.
weese@tanne:~/seqan$ cd demos
weese@tanne:~/seqan/demos$ make index_mums
weese@tanne:~/seqan/demos$ ./index_mums
0, 53, 33, 5    "SeqAn"
23, 36, 3, 8    "sequence"
weese@tanne:~/seqan/demos$
See Also
SeqAn - Sequence Analysis Library - www.seqan.de
 

Page built @2013/07/11 09:12:35