Page Demo Suffix Array

Example for how to create a suffix array and use it as a dictionary.

Given a sequence $s$, a suffix array is an array containing start positions of all suffices of $s$ in lexicographical order. A suffix array can simply be used to find all occurrences of an arbitrary substring $t$ in $s$ in O(|t|*log(|s|)).

SeqAn contains various suffix array construction algorithms like the Skew algorithm (J. Karkkainen and P. Sanders, "Simple Linear Work Suffix Array Construction", 2003), a more efficient modification of the Skew algorithm (difference cover of 7), external memory Skew algorithms, the prefix-doubling algorithm (U. Manber and G. Myers, "Suffix arrays: A new method for online string searching", 1993), the algorithm of Larsson and Sadakane (N.J. Larsson and K. Sadakane, "Faster Suffix Sorting", 1999), and a quicksort based algorithm.

The following example constructs a suffix array using the modified Skew algorithm and searches the interval of suffices beginning with $t="l"$. The start positions of these suffices are the occurences of $t$, which are outputted at last. This is only an example for createSuffixArray and similar functions. For an index based substring search better use the more generic Finder class (see @Demo.Index Finder@ demo).

///A tutorial about suffix arrays.
#include <iostream>
#include <seqan/index.h>

using namespace seqan;

int main()
{
    String<char> text = "hello world!";
    String<char> pattern = "l";
    String<unsigned> sa;

//Build a suffix array using the Skew7 algorithm.
    resize(sa, length(text));
    createSuffixArray(sa, text, Skew7());

//Search the interval of suffices beginning with the pattern.
    Pair<unsigned> hitRange;
    hitRange = equalRangeSA(text, sa, pattern);

//Output the suffix indices, i.e. the occurrences of the pattern.
    for (unsigned i = hitRange.i1; i < hitRange.i2; ++i)
        std::cout << sa[i] << " ";
    std::cout << std::endl;

    return 0;
}
weese@tanne:~/seqan$ cd demos
weese@tanne:~/seqan/demos$ make index_sufarray
weese@tanne:~/seqan/demos$ ./index_sufarray
9 2 3