Spec
AhoCorasickPatternMultiple exact string matching using Aho-Corasick.
Multiple exact string matching using Aho-Corasick.
Extends | Pattern |
---|---|
All Extended | Pattern |
Defined in | <seqan/find.h> |
Signature |
template <typename TNeedle>
class Pattern<TNeedle, AhoCorasick>;
|
Template Parameters
TNeedle |
The needle type, a string of keywords. |
---|
Interface Function Overview
Interface Functions Inherited From Pattern
Interface Metafunction Overview
Interface Metafunctions Inherited From Pattern
Detailed Description
The types of the keywords in the needle container and the haystack have to match.
Matching positions do not come in order because we report beginning positions of matches.
Likewise, if multiple keywords match at a given position no pre-specified order is guaranteed.
Examples
The following example program searches for three needles (queries) in two haystack sequences (db) using the Aho-Corasick algorithm.
#include <seqan/find.h>
using namespace seqan;
int main()
{
typedef String<AminoAcid> AminoAcidString;
// A simple amino acid database.
StringSet<AminoAcidString> dbs;
appendValue(dbs, "MARDPLY");
appendValue(dbs, "AVGGGGAAA");
// We put some words of the database into the queries.
String<AminoAcidString> queries;
appendValue(queries, "MARD");
appendValue(queries, "AAA");
appendValue(queries, "DPLY");
appendValue(queries, "VGGGG");
// Define the Aho-Corasick pattern over the queries with the preprocessing
// data structure.
Pattern<String<AminoAcidString>, AhoCorasick> pattern(queries);
// Search for the queries in the databases. We have to search database
// sequence by database sequence.
std::cout << "DB\tPOS\tENDPOS\tTEXT\n";
for (unsigned i = 0; i < length(dbs); ++i)
{
Finder<AminoAcidString> finder(dbs[i]); // new finder for each seq
while (find(finder, pattern))
std::cout << i << "\t" << position(finder) << "\t"
<< endPosition(finder) << "\t"
<< infix(finder) << "\n";
}
return 0;
}
When executed, this program will create the following output.
DB POS ENDPOS TEXT 0 0 4 MARD 0 3 7 DPLY 1 1 6 VGGGG 1 6 9 AAA