Group Index QGram Fibres
Tag to select a specific fibre (e.g. table, object, ...) of a IndexQGram.

Grouped Tags Overview

QGram_RawText
The concatenation of all text sequences.
QGramBucketMap
Maps q-gram hashes to buckets. This fibre is used by the OpenAddressingQGramIndex index and stores all parameters of the open addressing hash function and hash value occupancy in the QGramDir fibre. In contrast to OpenAddressingQGramIndex, IndexQGram uses a trivial 1-to-1 mapping from q-gram hash values to buckets. For that index the fibre is of type Nothing.
QGramCounts
The counts array.
QGramCountsDir
The counts directory.
QGramDir
The directory/hash table.
QGramSA
The suffix array.
QGramSADir
The union of suffix array and directory.
QGramShape
The shape the index is based on.
QGramText
The original text the index should be based on.

Grouped Tags Detail

`QGram_RawText`

The concatenation of all text sequences.

QGramText and QGram_RawText fibres are equal by default. They differ if the index text is a set of strings. Then, raw text is the concatenation of all strings in this set.

`QGramBucketMap`

Maps q-gram hashes to buckets. This fibre is used by the OpenAddressingQGramIndex index and stores all parameters of the open addressing hash function and hash value occupancy in the QGramDir fibre. In contrast to OpenAddressingQGramIndex, IndexQGram uses a trivial 1-to-1 mapping from q-gram hash values to buckets. For that index the fibre is of type Nothing.

`QGramCounts`

The counts array.

Contains the numbers of occurrences per sequence of each q-gram, s.t. the numbers of the same q-gram are stored in a contiguous block (q-gram count bucket). A bucket contains entries (seqNo,count) of sequences with at least one q-gram occurrence. q-grams exceeding the end of the text are ignored. The beginning of each count bucket can be determined by the q-gram counts directory (QGramCountsDir, see below).

Fibre returns a String over the alphabet of the SAValue of TIndex.

`QGramCountsDir`

The counts directory.

The counts directory contains for every possible q-gram hash value the start index of the q-gram count bucket. A q-gram count bucket is a contiguous interval in the counts array (QGramCounts, see above). The end index is the start index of the next bucket.

Fibre returns a String over the alphabet of a size type.

`QGramDir`

The directory/hash table.

The directory contains for every possible q-gram hash value the start index of the q-gram bucket. A q-gram bucket is a contiguous interval in the suffix array (QGramSA, see above). Each suffix in this interval begins with the same q-gram. The end index is the start index of the next bucket.

Fibre returns a String over the alphabet of a size type.

`QGramSA`

The suffix array.

Contains all occurrences of q-grams, s.t. the occurrences of a single q-gram are stored in a contiguous block (q-gram bucket). q-grams exceeding the end of the text are ignored. The beginning of each bucket can be determined by the q-gram directory (QGramDir, see below).

It corresponds to a suffix array which is sorted by the first q-gram.

Fibre returns a String over the alphabet of the SAValue of TIndex.

`QGramSADir`

The union of suffix array and directory.

In most applications a q-gram index consisting of both of these table is required. To efficiently create them at once use this tag for indexRequire or indexCreate.

`QGramShape`

The shape the index is based on.

The q-gram index needs an underlying Shape. This shape can be gapped or ungapped. The number of '1's (relevant positions) in the shape determines q and the size of the directory table.

Dynamic shapes (SimpleShape, GenericShape, ...) must be initialized before the index can be used.

`QGramText`

The original text the index should be based on.

Group Index QGram Fibres Tag to select a specific fibre (e.g. table, object, ...) of a IndexQGram.

Grouped Tags Overview

See Also

Grouped Tags Detail

QGram_RawText

QGramBucketMap

QGramCounts

QGramCountsDir

QGramDir

QGramSA

QGramSADir

QGramShape

QGramText