Spec AhoCorasickPattern
Multiple exact string matching using Aho-Corasick.

Extends Pattern
All Extended Pattern
Defined in <seqan/find.h>
Signature template <typename TNeedle> class Pattern<TNeedle, AhoCorasick>;

Template Parameters

TNeedle The needle type, a string of keywords.

Interface Function Overview

Interface Functions Inherited From Pattern

Interface Metafunction Overview

Interface Metafunctions Inherited From Pattern

Detailed Description

The types of the keywords in the needle container and the haystack have to match.

Matching positions do not come in order because we report beginning positions of matches.

Likewise, if multiple keywords match at a given position no pre-specified order is guaranteed.

Examples

The following example program searches for three needles (queries) in two haystack sequences (db) using the Aho-Corasick algorithm.

#include <seqan/find.h>

using namespace seqan;

int main()
{
    typedef String<AminoAcid> AminoAcidString;

    // A simple amino acid database.
    StringSet<AminoAcidString> dbs;
    appendValue(dbs, "MARDPLY");
    appendValue(dbs, "AVGGGGAAA");
    // We put some words of the database into the queries.
    String<AminoAcidString> queries;
    appendValue(queries, "MARD");
    appendValue(queries, "AAA");
    appendValue(queries, "DPLY");
    appendValue(queries, "VGGGG");

    // Define the Aho-Corasick pattern over the queries with the preprocessing
    // data structure.
    Pattern<String<AminoAcidString>, AhoCorasick> pattern(queries);

    // Search for the queries in the databases.  We have to search database
    // sequence by database sequence.
    std::cout << "DB\tPOS\tENDPOS\tTEXT\n";
    for (unsigned i = 0; i < length(dbs); ++i)
    {
        Finder<AminoAcidString> finder(dbs[i]);  // new finder for each seq
        while (find(finder, pattern))
            std::cout << i << "\t" << position(finder) << "\t"
                      << endPosition(finder) << "\t"
                      << infix(finder) << "\n";
    }

    return 0;
}

When executed, this program will create the following output.

DB      POS     ENDPOS  TEXT
0       0       4       MARD
0       3       7       DPLY
1       1       6       VGGGG
1       6       9       AAA