Page Maximal Repeats
Given a sequences, a repeat is a substring that occurs at at least 2 different positions. A maximal repeat is a repeat that cannot be extended to the left or to right to a longer repeat. The following example demonstrates how to iterate over all maximal repeats and output them.
#include <iostream> #include <seqan/index.h> using namespace seqan;
We begin with a @Class.String@ to store our sequence. Then we create an @Class.Index@ of this StringSet.
Afterwards we initialize a string with the sequence and build an index over it
int main () { String<char> myString = "How many wood would a woodchuck chuck."; typedef Index< String<char> > TMyIndex; TMyIndex myIndex(myString);
To find maximal repeats, we use SeqAn's MaxRepeatsIterator and set the minimum repeat length to 3.
Iterator< TMyIndex, SuperMaxRepeats >::Type myRepeatIterator(myIndex, 3); while (!atEnd(myRepeatIterator)) { // A repeat can be represented by its length and positions it occurs at. // Function getOccurrences returns an unordered sequence of these positions // The length of this sequence, i.e. the repeat abundance can be obtained // from countOccurrences. for(unsigned i = 0; i < countOccurrences(myRepeatIterator); ++i) std::cout << getOccurrences(myRepeatIterator)[i] << ", "; // Function repLength returns the length of the repeat string. std::cout << repLength(myRepeatIterator) << " "; // The repeat string itself can be determined with function representative. std::cout << "\t\"" << representative(myRepeatIterator) << '\"' << std::endl; ++myRepeatIterator; } return 0; }
A repeat can be represented by its length and positions it occurs at. $myRepeatIterator$ iterates over all repeat strings. Please note that in contrast to supermaximal repeats, given a maximal repeat string, not all pairs of its occurrences are maximal repeats. So we need an iterator to iterate over all maximal pairs of this repeat string. The @Spec.MaxRepeats Iterator@ can be seen as a container and be iterated for itself.
weese@tanne:~/seqan$ cd demos weese@tanne:~/seqan/demos$ make index_maxrepeats weese@tanne:~/seqan/demos$ ./index_maxrepeats < 8 , 21 >, 5 " wood" < 21 , 13 >, < 8 , 13 >, 3 " wo" < 26 , 32 >, 5 "chuck" weese@tanne:~/seqan/demos$