Page Maximal Repeats

Example for using the MaxRepeatsIterator.

Given a sequences, a repeat is a substring that occurs at at least 2 different positions. A maximal repeat is a repeat that cannot be extended to the left or to right to a longer repeat. The following example demonstrates how to iterate over all maximal repeats and output them.

#include <iostream>
#include <seqan/index.h>

using namespace seqan;

We begin with a @Class.String@ to store our sequence. Then we create an @Class.Index@ of this StringSet.

Afterwards we initialize a string with the sequence and build an index over it

int main ()
{
	String<char> myString = "How many wood would a woodchuck chuck.";

	typedef Index< String<char> > TMyIndex;
	TMyIndex myIndex(myString);

To find maximal repeats, we use SeqAn's MaxRepeatsIterator and set the minimum repeat length to 3.

	Iterator< TMyIndex, SuperMaxRepeats >::Type myRepeatIterator(myIndex, 3);

	while (!atEnd(myRepeatIterator))
	{
    // A repeat can be represented by its length and positions it occurs at.
    // Function getOccurrences returns an unordered sequence of these positions
    // The length of this sequence, i.e. the repeat abundance can be obtained
    // from countOccurrences.
		for(unsigned i = 0; i < countOccurrences(myRepeatIterator); ++i)
			std::cout << getOccurrences(myRepeatIterator)[i] << ", ";

    // Function repLength returns the length of the repeat string.
		std::cout << repLength(myRepeatIterator) << "   ";

    // The repeat string itself can be determined with function representative.
		std::cout << "\t\"" << representative(myRepeatIterator) << '\"' << std::endl;

		++myRepeatIterator;
	}

	return 0;
}

A repeat can be represented by its length and positions it occurs at. $myRepeatIterator$ iterates over all repeat strings. Please note that in contrast to supermaximal repeats, given a maximal repeat string, not all pairs of its occurrences are maximal repeats. So we need an iterator to iterate over all maximal pairs of this repeat string. The @Spec.MaxRepeats Iterator@ can be seen as a container and be iterated for itself.

weese@tanne:~/seqan$ cd demos
weese@tanne:~/seqan/demos$ make index_maxrepeats
weese@tanne:~/seqan/demos$ ./index_maxrepeats
< 8 , 21 >, 5           " wood"
< 21 , 13 >, < 8 , 13 >, 3      " wo"
< 26 , 32 >, 5          "chuck"
weese@tanne:~/seqan/demos$