Page Maximal Unique Matches
Example for using the Mums Iterator.
Given a set of sequences, a unique match is a match that occurs exactly once in each sequence. A maximal unique match (MUM) is a unique match that is not part of any longer unique match. The following example demonstrates how to iterate over all MUMs and output them.
///A tutorial about finding Mums. #include <iostream> #include <seqan/index.h> using namespace seqan; int main() { // We begin with a StringSet that stores multiple strings. StringSet<String<char> > mySet; resize(mySet, 3); mySet[0] = "SeqAn is a library for sequence analysis."; mySet[1] = "The String class is the fundamental sequence type in SeqAn."; mySet[2] = "Subsequences can be handled with SeqAn's Segment class."; // Then we create an Index of this StringSet. typedef Index<StringSet<String<char> > > TMyIndex; TMyIndex myIndex(mySet); // To find maximal unique matches (Mums), we use the Mums Iterator and set the minimum MUM length to 3. Iterator<TMyIndex, Mums>::Type myMUMiterator(myIndex, 3); String<SAValue<TMyIndex>::Type> occs; while (!atEnd(myMUMiterator)) { // A multiple match can be represented by the positions it occurs at in every sequence and its length. // getOccurrences@ returns an unordered sequence of pairs (seqNo,seqOfs) the match occurs at. occs = getOccurrences(myMUMiterator); //To order them ascending according seqNo we use orderOccurrences. orderOccurrences(occs); for (unsigned i = 0; i < length(occs); ++i) std::cout << getValueI2(occs[i]) << ", "; // repLength returns the length of the match. std::cout << repLength(myMUMiterator) << " "; // The match string itself can be determined with representative. std::cout << "\t\"" << representative(myMUMiterator) << '\"' << std::endl; ++myMUMiterator; } return 0; }
Demo: demos/dox/index/mums.cpp
The only maximal matches that occur in all 3 sequences are "SeqAn" and "sequence". They occur exactly once and thus are maximal unique matches.
weese@tanne:~/seqan$ cd demos weese@tanne:~/seqan/demos$ make index_mums weese@tanne:~/seqan/demos$ ./index_mums 0, 53, 33, 5 "SeqAn" 23, 36, 3, 8 "sequence" weese@tanne:~/seqan/demos$ *