Example Program
Maximal Repeats
Example for using the MaxRepeats Iterator.
Given a sequences, a repeat is a substring that occurs at at least 2 different positions. A maximal repeat is a repeat that cannot be extended to the left or to right to a longer repeat. The following example demonstrates how to iterate over all maximal repeats and output them.
A tutorial about finding maximal repeats.
1#include <iostream>
2#include <seqan/index.h>
3
4using namespace seqan;
5
6int main ()
7{
We begin with a String to store our sequence.
8    String<char> myString = "How many wood would a woodchuck chuck.";
9
Then we create an Index of this StringSet.
10    typedef Index< String<char> > TMyIndex;
11    TMyIndex myIndex(myString);
12
To find maximal repeats, we use SeqAn's MaxRepeats Iterator and set the minimum repeat length to 3.
13    typedef Iterator< TMyIndex, MaxRepeats >::Type TMaxRepeatIterator;
14    TMaxRepeatIterator myRepeatIterator(myIndex, 3);
15
16    while (!atEnd(myRepeatIterator)) 
17    {
A repeat can be represented by its length and positions it occurs at. myRepeatIterator iterates over all repeat strings. Please note that in contrast to supermaximal repeats, given a maximal repeat string, not all pairs of its occurences are maximal repeats. So we need an iterator to iterate over all maximal pairs of this repeat string. The MaxRepeats Iterator can be seen as a container and be iterated for itself.
18        Iterator<TMaxRepeatIterator>::Type myRepeatPair(myRepeatIterator);
19        while (!atEnd(myRepeatPair)) {
20            ::std::cout << *myRepeatPair << ", ";
21            ++myRepeatPair;
22        }
23
repLength returns the length of the repeat string.
24        ::std::cout << repLength(myRepeatIterator) << "   ";
25
The repeat string itself can be determined with representative
26        ::std::cout << "\t\"" << representative(myRepeatIterator) << '\"' << ::std::endl;
27
28        ++myRepeatIterator;
29    }
30
31    return 0;
32}
Output
As all supermaximal repeats (see Supermaximal Repeats) are also maximal repeats, " wood" and "chuck" are outputted. In "How many wood would a woodchuck chuck." " wo" is a repeat of length 3 that occurs at two pairs of positions which are maximal repeats (" a wood", "od woul" and "ny wood", "od woul"). Beside these there are no other maximal repeats of length at least 3.
weese@tanne:~/seqan$ cd demos
weese@tanne:~/seqan/demos$ make index_maxrepeats
weese@tanne:~/seqan/demos$ ./index_maxrepeats
< 8 , 21 >, 5           " wood"
< 21 , 13 >, < 8 , 13 >, 3      " wo"
< 26 , 32 >, 5          "chuck"
weese@tanne:~/seqan/demos$
SeqAn - Sequence Analysis Library - www.seqan.de
 

Page built @2011/02/08 21:37:01