Example Program
Maximal Unique Matches
Example for using the MUMs Iterator.
Given a set of sequences, a unique match is a match that occurs exactly once in each sequence. A maximal unique match (MUM) is a unique match that is not part of any longer unique match. The following example demonstrates how to iterate over all MUMs and output them.
1#include <iostream>
2#include <seqan/index.h>
3
4using namespace std;
5using namespace seqan;
6
7int main ()
8{
We begin with a StringSet that stores multiple strings.
9    StringSet< String<char> > mySet;
10    resize(mySet, 3);
11    mySet[0] = "SeqAn is a library for sequence analysis.";
12    mySet[1] = "The String class is the fundamental sequence type in SeqAn.";
13    mySet[2] = "Subsequences can be handled with SeqAn's Segment class.";
14
Then we create an Index of this StringSet.
15    typedef Index< StringSet<String<char> > > TMyIndex;
16    TMyIndex myIndex(mySet);
17
To find maximal unique matches (MUMs), we use the MUMs Iterator and set the minimum MUM length to 3.
18    Iterator< TMyIndex, MUMs >::Type myMUMiterator(myIndex, 3);
19    String< SAValue<TMyIndex>::Type > occs;
20
21    while (!atEnd(myMUMiterator)) 
22    {
A multiple match can be represented by positions it occurs at in every sequence and its length. getOccurrences returns an unordered sequence of pairs (seqNo,seqOfs) the match occurs at.
23        occs = getOccurrences(myMUMiterator);
To order them ascending according seqNo we use orderOccurrences.
24        orderOccurrences(occs);
25        
26        for(unsigned i = 0; i < length(occs); ++i)
27            cout << getValueI2(occs[i]) << ", ";
28
repLength returns the length of the match.
29        cout << repLength(myMUMiterator) << "   ";
30
The match string itself can be determined with representative.
31        cout << "\t\"" << representative(myMUMiterator) << '\"' << endl;
32
33        ++myMUMiterator;
34    }
35
36    return 0;
37}
Output
The only maximal matches that occur in all 3 sequences are "SeqAn" and "sequence". They occur exactly once and thus are maximal unique matches.
weese@tanne:~/seqan$ cd demos
weese@tanne:~/seqan/demos$ make index_mums
weese@tanne:~/seqan/demos$ ./index_mums
0, 53, 33, 5    "SeqAn"
23, 36, 3, 8    "sequence"
weese@tanne:~/seqan/demos$
SeqAn - Sequence Analysis Library - www.seqan.de