Page Demo Suffix Array
Given a sequence $s$, a suffix array is an array containing start positions of all suffices of $s$ in lexicographical order. A suffix array can simply be used to find all occurrences of an arbitrary substring $t$ in $s$ in O(|t|*log(|s|)).
SeqAn contains various suffix array construction algorithms like the Skew algorithm (J. Karkkainen and P. Sanders, "Simple Linear Work Suffix Array Construction", 2003), a more efficient modification of the Skew algorithm (difference cover of 7), external memory Skew algorithms, the prefix-doubling algorithm (U. Manber and G. Myers, "Suffix arrays: A new method for online string searching", 1993), the algorithm of Larsson and Sadakane (N.J. Larsson and K. Sadakane, "Faster Suffix Sorting", 1999), and a quicksort based algorithm.
The following example constructs a suffix array using the modified Skew algorithm and searches the interval of suffices beginning with $t="l"$. The start positions of these suffices are the occurences of $t$, which are outputted at last. This is only an example for createSuffixArray and similar functions. For an index based substring search better use the more generic Finder class (see @Demo.Index Finder@ demo).
///A tutorial about suffix arrays.
#include <iostream>
#include <seqan/index.h>
using namespace seqan;
int main()
{
String<char> text = "hello world!";
String<char> pattern = "l";
String<unsigned> sa;
//Build a suffix array using the Skew7 algorithm.
resize(sa, length(text));
createSuffixArray(sa, text, Skew7());
//Search the interval of suffices beginning with the pattern.
Pair<unsigned> hitRange;
hitRange = equalRangeSA(text, sa, pattern);
//Output the suffix indices, i.e. the occurrences of the pattern.
for (unsigned i = hitRange.i1; i < hitRange.i2; ++i)
std::cout << sa[i] << " ";
std::cout << std::endl;
return 0;
}
weese@tanne:~/seqan$ cd demos weese@tanne:~/seqan/demos$ make index_sufarray weese@tanne:~/seqan/demos$ ./index_sufarray 9 2 3