Class StringEnumerator
Class to enumerate all strings within a given edit/Hamming distance.

All Subcl's HammingStringEnumerator, LevenshteinStringEnumerator
Defined in <seqan/misc/edit_environment.h>
Signature template <typename TString, typename TSpec> class StringEnumerator<TString, TSpec>;

Template Parameters

TString Type of the string to enumerate the environment of.
TSpec Specialization tag.

Member Function Overview

Interface Function Overview

Interface Metafunction Overview

Member Variable Overview

Member Functions Detail

StringEnumerator::StringEnumerator(string[, minDist]);

Constructor

Parameters

string The string to use as the center. Types: TString.
minDist The smallest distance to generate strings with. Type: unsigned. Default: 0

Data Races

Thread safety unknown!

Interface Functions Detail

TIter begin(stringEnum[, tag]);

Return begin iterator.

Parameters

stringEnum StringEnumerator to query.
tag Iterator tag to use.

Returns

TIter Iterator to the first string in the enumerator.

Data Races

Thread safety unknown!

TIter end(stringEnum[, tag]);

Return end iterator.

Parameters

stringEnum StringEnumerator to query.
tag Iterator tag to use.

Returns

TIter End iterator for the string enumerator.

Data Races

Thread safety unknown!

TSize length(stringEnum);

Return number of strings that will be enumerated.

Parameters

stringEnum StringEnumerator to query.

Returns

TSize The number of elements in the enumerator (Metafunction: Size).

Data Races

Thread safety unknown!

Interface Metafunctions Detail

Difference<TStringEnumerator>::Type;

Returns difference type.

Host<TStringEnumerator>::Type;

Returns host type.

Position<TStringEnumerator, TSpec>::Type;

Returns iterator type.

Position<TStringEnumerator>::Type;

Returns position type.

Reference<TStringEnumerator>::Type;

Returns reference type of the enumerated strings.

Size<TStringEnumerator>::Type;

Returns size type.

Value<TStringEnumerator>::Type;

Return value type of the string to enumerate.

Member Variables Detail

bool StringEnumerator::trim

Indicate whether to ignore substitutions in first or last character of string in Levenshtein mode (optimization for approximate search).

This is useful when searching for such enumerated strings in large texts. Patterns with substitutions in the first base would also be found.

Examples

#include <iostream>

#include <seqan/basic.h>
#include <seqan/sequence.h>
#include <seqan/stream.h>      // For printing SeqAn Strings.
#include <seqan/misc/edit_environment.h>

using namespace seqan;

int main()
{
    Dna5String original = "CGAT";

    // Enumerate neighbourhood using Hamming distance.
    typedef StringEnumerator<Dna5String, EditEnvironment<HammingDistance, 2> > THammingEnumerator;
    typedef Iterator<THammingEnumerator>::Type THammingIterator;
    std::cout << "Enumerating Hamming distance environment of " << original << " of distance 2\n";
    THammingEnumerator hammingEnumerator(original);
    for (THammingIterator itH = begin(hammingEnumerator); !atEnd(itH); goNext(itH))
        std::cout << *itH << '\n';

    // Enumerate neighbourhood using edit distance.
    typedef StringEnumerator<Dna5String, EditEnvironment<LevenshteinDistance, 2> > TEditEnumerator;
    typedef Iterator<TEditEnumerator>::Type TEditIterator;
    std::cout << "\nEnumerating edit distance environment of " << original << " of distance 1-2\n";
    TEditEnumerator editEnumerator(original);
    for (TEditIterator itE = begin(editEnumerator); !atEnd(itE); goNext(itE))
        std::cout << *itE << '\n';

    return 0;
}
Enumerating Hamming distance environment of CGAT of distance 2
AGAT
CGAT
GGAT
TGAT
NGAT
CAAT
CCAT
CTAT
CNAT
CGCT
CGGT
CGTT
CGNT
CGAA
CGAC
CGAG
CGAN
AAAT
ACAT
ATAT
ANAT
GAAT
GCAT
GTAT
GNAT
TAAT
TCAT
TTAT
TNAT
NAAT
NCAT
NTAT
NNAT
AGCT
AGGT
AGTT
AGNT
GGCT
GGGT
GGTT
GGNT
TGCT
TGGT
TGTT
TGNT
NGCT
NGGT
NGTT
NGNT
AGAA
AGAC
AGAG
AGAN
GGAA
GGAC
GGAG
GGAN
TGAA
TGAC
TGAG
TGAN
NGAA
NGAC
NGAG
NGAN
CACT
CAGT
CATT
CANT
CCCT
CCGT
CCTT
CCNT
CTCT
CTGT
CTTT
CTNT
CNCT
CNGT
CNTT
CNNT
CAAA
CAAC
CAAG
CAAN
CCAA
CCAC
CCAG
CCAN
CTAA
CTAC
CTAG
CTAN
CNAA
CNAC
CNAG
CNAN
CGCA
CGCC
CGCG
CGCN
CGGA
CGGC
CGGG
CGGN
CGTA
CGTC
CGTG
CGTN
CGNA
CGNC
CGNG
CGNN

Enumerating edit distance environment of CGAT of distance 1-2
CGAT
CAAT
CCAT
CTAT
CNAT
CGCT
CGGT
CGTT
CGNT
GAT
CAT
CGT
CGA
CAGAT
CCGAT
CGGAT
CTGAT
CNGAT
CGAAT
CGCAT
CGGAT
CGTAT
CGNAT
CGAAT
CGACT
CGAGT
CGATT
CGANT
CACT
CAGT
CATT
CANT
CCCT
CCGT
CCTT
CCNT
CTCT
CTGT
CTTT
CTNT
CNCT
CNGT
CNTT
CNNT
CAT
CCT
CTT
CNT
CAA
CCA
CTA
CNA
CGC
CGG
CGT
CGN
CAAAT
CACAT
CAGAT
CATAT
CANAT
CCAAT
CCCAT
CCGAT
CCTAT
CCNAT
CTAAT
CTCAT
CTGAT
CTTAT
CTNAT
CNAAT
CNCAT
CNGAT
CNTAT
CNNAT
CAAAT
CAACT
CAAGT
CAATT
CAANT
CCAAT
CCACT
CCAGT
CCATT
CCANT
CTAAT
CTACT
CTAGT
CTATT
CTANT
CNAAT
CNACT
CNAGT
CNATT
CNANT
CGCAT
CGCCT
CGCGT
CGCTT
CGCNT
CGGAT
CGGCT
CGGGT
CGGTT
CGGNT
CGTAT
CGTCT
CGTGT
CGTTT
CGTNT
CGNAT
CGNCT
CGNGT
CGNTT
CGNNT
GCT
GGT
GTT
GNT
AT
GT
GA
CT
CA
CG
GAAT
GCAT
GGAT
GTAT
GNAT
GAAT
GACT
GAGT
GATT
GANT
CAAT
CACT
CAGT
CATT
CANT
CAGCT
CAGGT
CAGTT
CAGNT
CCGCT
CCGGT
CCGTT
CCGNT
CGGCT
CGGGT
CGGTT
CGGNT
CTGCT
CTGGT
CTGTT
CTGNT
CNGCT
CNGGT
CNGTT
CNGNT
CAGT
CCGT
CGGT
CTGT
CNGT
CAGA
CCGA
CGGA
CTGA
CNGA
CGAA
CGCA
CGGA
CGTA
CGNA
CAAGAT
CACGAT
CAGGAT
CATGAT
CANGAT
CCAGAT
CCCGAT
CCGGAT
CCTGAT
CCNGAT
CGAGAT
CGCGAT
CGGGAT
CGTGAT
CGNGAT
CTAGAT
CTCGAT
CTGGAT
CTTGAT
CTNGAT
CNAGAT
CNCGAT
CNGGAT
CNTGAT
CNNGAT
CAGAAT
CAGCAT
CAGGAT
CAGTAT
CAGNAT
CCGAAT
CCGCAT
CCGGAT
CCGTAT
CCGNAT
CGGAAT
CGGCAT
CGGGAT
CGGTAT
CGGNAT
CTGAAT
CTGCAT
CTGGAT
CTGTAT
CTGNAT
CNGAAT
CNGCAT
CNGGAT
CNGTAT
CNGNAT
CAGAAT
CAGACT
CAGAGT
CAGATT
CAGANT
CCGAAT
CCGACT
CCGAGT
CCGATT
CCGANT
CGGAAT
CGGACT
CGGAGT
CGGATT
CGGANT
CTGAAT
CTGACT
CTGAGT
CTGATT
CTGANT
CNGAAT
CNGACT
CNGAGT
CNGATT
CNGANT
CGAAAT
CGACAT
CGAGAT
CGATAT
CGANAT
CGCAAT
CGCCAT
CGCGAT
CGCTAT
CGCNAT
CGGAAT
CGGCAT
CGGGAT
CGGTAT
CGGNAT
CGTAAT
CGTCAT
CGTGAT
CGTTAT
CGTNAT
CGNAAT
CGNCAT
CGNGAT
CGNTAT
CGNNAT
CGAAAT
CGAACT
CGAAGT
CGAATT
CGAANT
CGCAAT
CGCACT
CGCAGT
CGCATT
CGCANT
CGGAAT
CGGACT
CGGAGT
CGGATT
CGGANT
CGTAAT
CGTACT
CGTAGT
CGTATT
CGTANT
CGNAAT
CGNACT
CGNAGT
CGNATT
CGNANT
CGAAAT
CGAACT
CGAAGT
CGAATT
CGAANT
CGACAT
CGACCT
CGACGT
CGACTT
CGACNT
CGAGAT
CGAGCT
CGAGGT
CGAGTT
CGAGNT
CGATAT
CGATCT
CGATGT
CGATTT
CGATNT
CGANAT
CGANCT
CGANGT
CGANTT
CGANNT