Group Index QGram Fibres
Tag to select a specific fibre (e.g. table, object, ...) of a IndexQGram.

Grouped Tags Overview

See Also

Grouped Tags Detail

QGram_RawText

The concatenation of all text sequences.

QGramText and QGram_RawText fibres are equal by default. They differ if the index text is a set of strings. Then, raw text is the concatenation of all strings in this set.

QGramBucketMap

Maps q-gram hashes to buckets. This fibre is used by the OpenAddressingQGramIndex index and stores all parameters of the open addressing hash function and hash value occupancy in the QGramDir fibre. In contrast to OpenAddressingQGramIndex, IndexQGram uses a trivial 1-to-1 mapping from q-gram hash values to buckets. For that index the fibre is of type Nothing.

QGramCounts

The counts array.

Contains the numbers of occurrences per sequence of each q-gram, s.t. the numbers of the same q-gram are stored in a contiguous block (q-gram count bucket). A bucket contains entries (seqNo,count) of sequences with at least one q-gram occurrence. q-grams exceeding the end of the text are ignored. The beginning of each count bucket can be determined by the q-gram counts directory (QGramCountsDir, see below).

Fibre returns a String over the alphabet of the SAValue of TIndex.

QGramCountsDir

The counts directory.

The counts directory contains for every possible q-gram hash value the start index of the q-gram count bucket. A q-gram count bucket is a contiguous interval in the counts array (QGramCounts, see above). The end index is the start index of the next bucket.

Fibre returns a String over the alphabet of a size type.

QGramDir

The directory/hash table.

The directory contains for every possible q-gram hash value the start index of the q-gram bucket. A q-gram bucket is a contiguous interval in the suffix array (QGramSA, see above). Each suffix in this interval begins with the same q-gram. The end index is the start index of the next bucket.

Fibre returns a String over the alphabet of a size type.

QGramSA

The suffix array.

Contains all occurrences of q-grams, s.t. the occurrences of a single q-gram are stored in a contiguous block (q-gram bucket). q-grams exceeding the end of the text are ignored. The beginning of each bucket can be determined by the q-gram directory (QGramDir, see below).

It corresponds to a suffix array which is sorted by the first q-gram.

Fibre returns a String over the alphabet of the SAValue of TIndex.

QGramSADir

The union of suffix array and directory.

In most applications a q-gram index consisting of both of these table is required. To efficiently create them at once use this tag for indexRequire or indexCreate.

QGramShape

The shape the index is based on.

The q-gram index needs an underlying Shape. This shape can be gapped or ungapped. The number of '1's (relevant positions) in the shape determines q and the size of the directory table.

Dynamic shapes (SimpleShape, GenericShape, ...) must be initialized before the index can be used.

QGramText

The original text the index should be based on.