Class Specialization
AhoCorasick
Multiple exact string matching using Aho-Corasick.
Pattern
AhoCorasick
Pattern<TNeedle, AhoCorasick>
Include Headers
seqan/find.h
Parameters
TNeedle
The needle type, a string of keywords.
Types: String
Remarks
The types of the keywords in the needle container and the haystack have to match.
Matching positions do not come in order because we report beginning positions of matches.
Likewise, if multiple keywords match at a given position no pre-specified order is guaranteed.
Specialization of
Metafunctions
HostType of the object a given object depends on. (Pattern)
NeedleReturns the needle type of a Pattern type. (Pattern)
ScoringSchemeReturns the scoring scheme of an approximate searching algorithm. (Pattern)
Functions
findSearch for a Pattern in a Finder object. (Pattern)
findBeginSearch the begin of an approximate match. (Pattern)
hostThe object a given object depends on. (Pattern)
needleReturns the needle of a Pattern object (not implemented for some online-algorithms). (Pattern)
positionPosition of an iterator. (Pattern)
scoringSchemeThe scoring scheme used for finding or aligning. (Pattern)
setNeedleSets the needle of a Pattern object and optionally induces preprocessing. (Pattern)
setScoringSchemeSets the scoring scheme used for finding or aligning. (Pattern)
Examples
The following example program searches for three needles (queries) in two haystack sequences (db) using the Aho-Corasick algorithm.
1#include <seqan/find.h>
2
3using namespace seqan;
4
5int main()
6{
7    typedef String<AminoAcid> AminoAcidString;
8
9    // A simple amino acid database.
10    StringSet<AminoAcidString> dbs;
11    appendValue(dbs, "MARDPLY");
12    appendValue(dbs, "AVGGGGAAA");
13    // We put some words of the database into the queries.
14    String<AminoAcidString> queries;
15    appendValue(queries, "MARD");
16    appendValue(queries, "AAA");
17    appendValue(queries, "DPLY");
18    appendValue(queries, "VGGGG");
19
20    // Define the Aho-Corasick pattern over the queries with the preprocessing
21    // data structure.
22    Pattern<String<AminoAcidString>, AhoCorasick> pattern(queries);
23
24    // Search for the queries in the databases.  We have to search database
25    // sequence by database sequence.
26    std::cout << "DB\tPOS\tENDPOS\tTEXT\n";
27    for (unsigned i = 0; i < length(dbs); ++i)
28    {
29        Finder<AminoAcidString> finder(dbs[i]);  // new finder for each seq
30        while (find(finder, pattern))
31            std::cout << i << "\t" << position(finder) << "\t"
32                      << endPosition(finder) << "\t"
33                      << infix(finder) << "\n";
34    }
35
36    return 0;
37}
When executed, this program will create the following output.
DB      POS     ENDPOS  TEXT
0       0       4       MARD
0       3       7       DPLY
1       1       6       VGGGG
1       6       9       AAA
SeqAn - Sequence Analysis Library - www.seqan.de
 

Page built @2013/07/11 09:12:38