Spec IndexQGram
An index based on an array of sorted q-grams.

Extends Index
Implements StringTrieConcept
All Extended Index
All Subcl's OpenAddressingQGramIndex
All Impl'd StringTrieConcept
Defined in <seqan/index.h>
Signature template <typename TIndex, typename TShapeSpec, typename TSpec> class Index<TText, IndexQGram<TShapeSpec[, TSpec]> >;

Template Parameters

TSpec The specializing type. Types: OpenAdressing, Default: void
TText The text type. Types: String
TShapeSpec The Shape specialization type.

Member Function Overview

Interface Function Overview

Interface Functions Inherited From Index

Interface Metafunction Overview

Interface Metafunctions Inherited From Index

Detailed Description

The fibres (see Index and Fibre) of this index are a suffix array sorted by the first q characters (see QGramSA) and a q-gram directory (see QGramDir).

The size of the q-gram directory is Σq. On a 32 bit system the q-gram length is limited to 3 for char alphabets or 13-14 for Dna alphabets.

Consider to use the OpenAddressingQGramIndex for longer q-grams if you don't need q-grams to be sorted.

See Also

Member Functions Detail

Index::Index(); Index::Index(index); Index::Index(text[, shape]);

Constructor

Parameters

index Other Index object to copy from.
text The text to be indexed.
shape The q gram Shape to be applied. (optional)

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

Interface Functions Detail

TSize countOccurrences(index, shape);

Returns the number of occurrences of a q-gram in the index text.

Parameters

index IndexQGram object to query.
shape A Shape object to use for the query.

Returns

TSize The number of positions where the q-gram stored in shape occurs in the text (see QGramText). Metafunction: Size.

The necessary index tables are built on-demand via indexRequire if index is not const.

Demo: Demo.Supermaximal Repeats

Demo: Demo.Index countChildren

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TCountInfix countOccurrencesMultiple(index, shape);

Returns the number of occurrences of a q-gram for every sequence of a StringSet .

Parameters

index A IndexQGram of a StringSet.
shape A Shape.

Returns

TCountInfix A sequence of pairs (seqNo,count), count > 0. For every StringSet sequence the q-gram occurs in, seqNo is the sequence number and count the number of occurrences. If the type of index is TIndex the return type is Infix<Fibre<TIndex, QGramCounts>::Type const>::Type.

The necessary index tables are built on-demand via indexRequire if index is not const.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

See Also

void createCountsArray(counts, dir, bucketMap, stringSet, shape, stepSize);

Warning.

This function should not be called directly. Please use indexCreate or indexRequire. The resulting tables must have appropriate size before calling this function.

Builds an index on a StringSet storing how often a q-gram occurs in each sequence.

Parameters

counts The resulting String of pairs (seqNo, count).
dir The resulting array that indicates at which position in the count table the corresponding a certain q-gram can be found.
bucketMap Stores the q-gram hashes for the openaddressing hash maps, see indexBucketMap. If bucketMap is of the type Nothing the q-gram hash determines the bucket address in the index.
stringSet The StringSet.
shape The Shape to be used.
stepSize Store every stepSizeth q-gram in the index, IntegerConcept.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

void createQGramIndex(index); void createQGramIndex(sa, dir, bucketMap, text, shape, stepSize); [DEPRECATED]

Warning.

This function should not be called directly. Please use indexCreate or indexRequire. The resulting tables must have appropriate size before calling this function.

Builds a q-gram index on a sequence.

Parameters

index The IndexQGram to create.
sa The resulting list in which all q-grams are sorted alphabetically.
dir The resulting array that indicates at which position in index the corresponding q-grams
bucketMap Stores the q-gram hashes for the openaddressing hash maps, see indexBucketMap. If bucketMap is of the type Nothing the q-gram hash determines the bucket address in the index.
text The TextConcept object to build the index for.
shape The shape to be used. Types: Shape can be found.
stepSize Store every stepSize'th q-gram in the index, IntegerConcept.

The resulting q-gram index contains the sorted list of qgrams. For each q-gram dir contains the first position in index that corresponds to this q-gram.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

void createQGramIndexDirOnly(dir, bucketMap, text, shape, stepSize);

Warning.

This function should not be called directly. Please use indexCreate or indexRequire. The resulting tables must have appropriate size before calling this function.

Builds the directory of a q-gram index on a sequence.

Parameters

dir The resulting array that indicates at which position in index the corresponding q-grams can be found.
bucketMap Stores the q-gram hashes for the openaddressing hash maps, see indexBucketMap. If bucketMap is of the type Nothing the q-gram hash determines the bucket address in the index.
text The sequence, TextConcept.
stepSize Store every stepSize'th q-gram in the index, IntegerConcept.
shape The Shape to be used.

The resulting index contains the sorted list of qgrams. For each possible q-gram pos contains the first position in index that corresponds to this q-gram.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

void createQGramIndexSAOnly(sa, text, shape, stepSize)

Warning.

This function should not be called directly. Please use indexCreate or indexRequire. The resulting tables must have appropriate size before calling this function.

Builds the suffix array of a q-gram index on a sequence.

Parameters

sa The resulting list in which all q-grams are sorted alphabetically, String object of SAValue of the underlying text.
text The sequence, a TextConcept object.
shape The Shape to be used. q is the length of this shape.
stepSize Store every stepSize'th q-gram in the index, IntegerConcept.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TFibre dirAt(position, index);

Shortcut for value(indexDir(index), position).

Parameters

index The IndexQGram object holding the fibre.
position A position in the array on which the value should be accessed.

Returns

TFibre A reference to the QGramDir fibre.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

void getKmerSimilarityMatrix(index, distMat[, seqSet]);

Creates a matrix storing the number of common q-grams between all pairs of sequences.

Parameters

distMat The resulting q-gram similarity matrix. Types: ContainerConcept
seqSet Contains sequence numbers if only a subset of sequences should be compared. Types: ContainerConcept
index The IndexQGram to use.

distMat need to be a container of a floating point type and will be resized to seqCount * seqCount, where seqCount is the number of sequences in the index/in seqSet. The fraction of common q-grams between sequence i and j is stored at position i*seqCount + j. It sums up the minimum number of q-gram occurrences between both sequences for each q-gram and normalizes it.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TSAValue getOccurrence(index, shape);

Returns an occurrence a q-gram in the index text.

Parameters

index The IndexQGram object to query.
shape The Shape object to use for the query. The shape stores the q-gram of the last call ot hash or hashNext.

Returns

TSAValue A position where the q-gram stored in shape occurs in the text (see QGramText). Type: SAValue of the index type.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TSAInfix getOccurrences(index, shape);

Returns all occurrences of a q-gram in the index text.

Parameters

index A IndexQGram to query.
shape A Shape object.

Returns

TSAInfix All positions where the q-gram stored in shape occurs in the text (see QGramText). Tupes: Infix<Fibre<TIndex, QGramSA>::Type>::Type>.

The necessary index tables are built on-demand via indexRequire if index is not const.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

See Also

TSize getStepSize(index);

Return the q-gram step size used for index creation.

Parameters

index A IndexQGram object.

Returns

TSize The step size of type Size. If x is returned every xth q-gram is stored in the index.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

See Also

TFibre indexBucketMap(index);

Shortcut for getFibre(index, QGramBucketMap()).

Parameters

index The IndexQGram object holding the fibre.

Returns

TFibre A reference to the QGramBucketMap fibre (maps q-gram hashes to buckets).

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TFibre indexCounts(index);

Shortcut for getFibre(index, QGramCounts()).

Parameters

index The IndexQGram object holding the fibre.

Returns

TFibre A reference to the QGramCounts fibre (counts array).

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TFibre indexCountsDir(index);

Shortcut for getFibre(index, QGramCountsDir()).

Parameters

index The IndexQGram object holding the fibre.

Returns

TFibre A reference to the QGramCountsDir fibre (counts directory).

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TFibre indexDir(index);

Shortcut for getFibre(.., QGramDir()).

Parameters

index The IndexQGram object holding the fibre.

Returns

TFibre A reference to the QGramDir fibre (q-gram directory).

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TSa indexSA(index);

Shortcut for getFibre(.., QGramSA).

Parameters

index The IndexQGram object holding the fibre.

Returns

TSa A reference to the QGramSA fibre (q-gram positions).

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TFibre indexShape(index);

Shortcut for getFibre(index, QGramShape()).

Parameters

index The Index object holding the fibre. Types: IndexQGram

Returns

TFibre Returns a reference to the Shape object of a q-gram index. Formally, this is a reference to the QGramShape fibre.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TPair range(index, shape);

Returns the suffix array interval borders of a q-gram in the index text.

Parameters

index The IndexQGram object to query.
shape The Shape object to use for the query.

Returns

TPair All positions where the q-gram stored in shape occurs in the text (see QGramText) are stored in a contiguous range of the suffix array. range returns begin and end position of this range. If the type of index is TIndex the return type is Pair<Size<TIndex>::Type>.

The necessary index tables are built on-demand via indexRequire if index is not const.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

TValue saAt(position, index);

Note.

Advanced functionality, not commonly used.

Shortcut for value(indexSA(..), ..).

Parameters

index The IndexQGram object holding the fibre.
position A position in the array on which the value should be accessed.

Returns

TValue A reference or proxy to the value in the QGramSA fibre. To be more precise, a reference to a position containing a value of type SAValue is returned (or a proxy).

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

void setStepSize(index, stepSize);

Change the q-gram step size used for index creation.

Parameters

index The IndexQGram to modify.
stepSize The step size, IntegerConcept; Each stepSizeth q-gram will be stored.

The default step size of a q-gram index is 1, which corresponds to all overlapping q-grams. To take effect of changing the stepSize the q-gram index should be empty or recreated.

A stepSize of 0 corresponds to stepSize = length(indexShape(index)), i.e. all non-overlapping q-grams.

Data Races

If not stated otherwise, concurrent invocation is not guaranteed to be thread-safe.

See Also