Page Demo Suffix Array

Example for how to create a suffix array and use it as a dictionary.

Given a sequence $s$, a suffix array is an array containing start positions of all suffices of $s$ in lexicographical order. A suffix array can simply be used to find all occurrences of an arbitrary substring $t$ in $s$ in O(|t|*log(|s|)).

SeqAn contains various suffix array construction algorithms like the Skew algorithm (J. Karkkainen and P. Sanders, "Simple Linear Work Suffix Array Construction", 2003), a more efficient modification of the Skew algorithm (difference cover of 7), external memory Skew algorithms, the prefix-doubling algorithm (U. Manber and G. Myers, "Suffix arrays: A new method for online string searching", 1993), the algorithm of Larsson and Sadakane (N.J. Larsson and K. Sadakane, "Faster Suffix Sorting", 1999), and a quicksort based algorithm.

The following example constructs a suffix array using the modified Skew algorithm and searches the interval of suffices beginning with $t="l"$. The start positions of these suffices are the occurences of $t$, which are outputted at last. This is only an example for createSuffixArray and similar functions. For an index based substring search better use the more generic Finder class (see @Demo.Index Finder@ demo).

///A tutorial about suffix arrays.
#include <iostream>
#include <seqan/index.h>

using namespace seqan;

int main ()
{
	String<char> text = "hello world!";
	String<char> pattern = "l";
	String<unsigned> sa;

//Build a suffix array using the Skew7 algorithm.
	resize(sa, length(text));
	createSuffixArray(sa, text, Skew7());

//Search the interval of suffices beginning with the pattern.
	Pair<unsigned> hitRange;
	hitRange = equalRangeSA(text, sa, pattern);

//Output the suffix indices, i.e. the occurrences of the pattern.
	for(unsigned i = hitRange.i1; i < hitRange.i2; ++i)
		std::cout << sa[i] << " ";
	std::cout << std::endl;
 
	return 0;
}

Demo: demos/index/index_sufarray.cpp

weese@tanne:~/seqan$ cd demos
weese@tanne:~/seqan/demos$ make index_sufarray
weese@tanne:~/seqan/demos$ ./index_sufarray
9 2 3