Page Demo Suffix Array
Given a sequence $s$, a suffix array is an array containing start positions of all suffices of $s$ in lexicographical order. A suffix array can simply be used to find all occurrences of an arbitrary substring $t$ in $s$ in O(|t|*log(|s|)).
SeqAn contains various suffix array construction algorithms like the Skew algorithm (J. Karkkainen and P. Sanders, "Simple Linear Work Suffix Array Construction", 2003), a more efficient modification of the Skew algorithm (difference cover of 7), external memory Skew algorithms, the prefix-doubling algorithm (U. Manber and G. Myers, "Suffix arrays: A new method for online string searching", 1993), the algorithm of Larsson and Sadakane (N.J. Larsson and K. Sadakane, "Faster Suffix Sorting", 1999), and a quicksort based algorithm.
The following example constructs a suffix array using the modified Skew algorithm and searches the interval of suffices beginning with $t="l"$. The start positions of these suffices are the occurences of $t$, which are outputted at last. This is only an example for createSuffixArray and similar functions. For an index based substring search better use the more generic Finder class (see @Demo.Index Finder@ demo).
///A tutorial about suffix arrays. #include <iostream> #include <seqan/index.h> using namespace seqan; int main () { String<char> text = "hello world!"; String<char> pattern = "l"; String<unsigned> sa; //Build a suffix array using the Skew7 algorithm. resize(sa, length(text)); createSuffixArray(sa, text, Skew7()); //Search the interval of suffices beginning with the pattern. Pair<unsigned> hitRange; hitRange = equalRangeSA(text, sa, pattern); //Output the suffix indices, i.e. the occurrences of the pattern. for(unsigned i = hitRange.i1; i < hitRange.i2; ++i) std::cout << sa[i] << " "; std::cout << std::endl; return 0; }
weese@tanne:~/seqan$ cd demos weese@tanne:~/seqan/demos$ make index_sufarray weese@tanne:~/seqan/demos$ ./index_sufarray 9 2 3