Example Program
Maximal Repeats
Example for using the MaxRepeats Iterator.
Given a sequences, a repeat is a substring that occurs at at least 2 different positions.
A maximal repeat is a repeat that cannot be extended to the left or to right to a longer repeat. The following
example demonstrates how to iterate over all maximal repeats and output them.
File "index_maxrepeats.cpp"
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 |
We begin with a String to store our sequence.
9 | |
10 |
11 | |
12 | |
13 |
To find maximal repeats, we use SeqAn's MaxRepeats Iterator
and set the minimum repeat length to 3.
14 | |
15 | |
16 | |
17 | |
18 |
A repeat can be represented by its length and positions it occurs at.
myRepeatIterator iterates over all repeat strings.
Please note that in contrast to supermaximal repeats, given a maximal repeat string,
not all pairs of its occurences are maximal repeats.
So we need an iterator to iterate over all maximal pairs of this repeat string.
The MaxRepeats Iterator can be seen as a container and be iterated for itself.
19 | |
20 | |
21 | |
22 | |
23 | |
24 |
repLength returns the length of the repeat string.
25 | |
26 |
The repeat string itself can be determined with representative
27 | |
28 | |
29 | |
30 | |
31 | |
32 | |
33 |
Output
As all supermaximal repeats (see Supermaximal Repeats) are also maximal repeats,
" wood" and "chuck" are outputted. In "How many wood would a woodchuck chuck."
" wo" is a repeat of length 3 that occurs at two pairs
of positions which are maximal repeats (" a wood", "od woul" and "ny wood", "od woul" ).
Beside these there are no other maximal repeats of length at least 3.
weese@tanne:~/seqan/demos$ make index_maxrepeats
weese@tanne:~/seqan/demos$ ./index_maxrepeats
< 8 , 21 >, 5 " wood"
< 21 , 13 >, < 8 , 13 >, 3 " wo"
< 26 , 32 >, 5 "chuck"
weese@tanne:~/seqan/demos$
SeqAn - Sequence Analysis Library - www.seqan.de