Group Index QGram Fibres
Tag to select a specific fibre (e.g. table, object, ...) of a IndexQGram.
Grouped Tags Overview
-
QGram_RawText
The concatenation of all text sequences. -
QGramBucketMap
Maps q-gram hashes to buckets. This fibre is used by the OpenAddressingQGramIndex index and stores all parameters of the open addressing hash function and hash value occupancy in the QGramDir fibre. In contrast to OpenAddressingQGramIndex, IndexQGram uses a trivial 1-to-1 mapping from q-gram hash values to buckets. For that index the fibre is of type Nothing. -
QGramCounts
The counts array. -
QGramCountsDir
The counts directory. -
QGramDir
The directory/hash table. -
QGramSA
The suffix array. -
QGramSADir
The union of suffix array and directory. -
QGramShape
The shape the index is based on. -
QGramText
The original text the index should be based on.
See Also
Grouped Tags Detail
QGram_RawText
QGramText and QGram_RawText fibres are equal by default. They differ if the index text is a set of strings. Then, raw text is the concatenation of all strings in this set.
QGramBucketMap
QGramCounts
Contains the numbers of occurrences per sequence of each q-gram, s.t. the numbers of the same q-gram are stored in a contiguous block (q-gram count bucket). A bucket contains entries (seqNo,count) of sequences with at least one q-gram occurrence. q-grams exceeding the end of the text are ignored. The beginning of each count bucket can be determined by the q-gram counts directory (QGramCountsDir, see below).
Fibre returns a String over the alphabet of the SAValue of TIndex.
QGramCountsDir
The counts directory contains for every possible q-gram hash value the start index of the q-gram count bucket. A q-gram count bucket is a contiguous interval in the counts array (QGramCounts, see above). The end index is the start index of the next bucket.
QGramDir
The directory contains for every possible q-gram hash value the start index of the q-gram bucket. A q-gram bucket is a contiguous interval in the suffix array (QGramSA, see above). Each suffix in this interval begins with the same q-gram. The end index is the start index of the next bucket.
QGramSA
Contains all occurrences of q-grams, s.t. the occurrences of a single q-gram are stored in a contiguous block (q-gram bucket). q-grams exceeding the end of the text are ignored. The beginning of each bucket can be determined by the q-gram directory (QGramDir, see below).
It corresponds to a suffix array which is sorted by the first q-gram.
Fibre returns a String over the alphabet of the SAValue of TIndex.
QGramSADir
In most applications a q-gram index consisting of both of these table is required. To efficiently create them at once use this tag for indexRequire or indexCreate.
QGramShape
The q-gram index needs an underlying Shape. This shape can be gapped or ungapped. The number of '1's (relevant positions) in the shape determines q and the size of the directory table.
Dynamic shapes (SimpleShape, GenericShape, ...) must be initialized before the index can be used.
QGramText
Dox Sources
/*! * @defgroup QGramIndexFibres Index QGram Fibres * * @brief Tag to select a specific fibre (e.g. table, object, ...) of a @link * IndexQGram @endlink. * * @see Fibre * @see Index#getFibre * @see IndexQGram * * @tag QGramIndexFibres#QGramDir * * @brief The directory/hash table. * * The directory contains for every possible q-gram hash value the start index * of the q-gram bucket. A q-gram bucket is a contiguous interval in the suffix * array (<tt>QGramSA</tt>, see above). Each suffix in this interval begins with * the same q-gram. The end index is the start index of the next bucket. * * @link Fibre @endlink returns a @link String @endlink over the alphabet of a * size type. * * @tag QGramIndexFibres#QGramBucketMap * * @brief Maps q-gram hashes to buckets. This fibre is used by the @link * OpenAddressingQGramIndex @endlink index and stores all parameters of * the open addressing hash function and hash value occupancy in the * QGramDir fibre. In contrast to @link OpenAddressingQGramIndex * @endlink, @link IndexQGram @endlink uses a trivial 1-to-1 mapping from * q-gram hash values to buckets. For that index the fibre is of type * @link Nothing @endlink. * * @tag QGramIndexFibres#QGramCountsDir * * @brief The counts directory. * * The counts directory contains for every possible q-gram hash value the start * index of the q-gram count bucket. A q-gram count bucket is a contiguous * interval in the counts array (<tt>QGramCounts</tt>, see above). The end index * is the start index of the next bucket. * * @link Fibre @endlink returns a @link String @endlink over the alphabet of a * size type. * * @tag QGramIndexFibres#QGramText * * @brief The original text the index should be based on. * * @tag QGramIndexFibres#QGramShape * * @brief The shape the index is based on. * * The q-gram index needs an underlying @link Shape @endlink. This shape can be * gapped or ungapped. The number of '1's (relevant positions) in the shape * determines <tt>q</tt> and the size of the directory table. * * Dynamic shapes (@link SimpleShape @endlink, @link GenericShape @endlink, ...) * must be initialized before the index can be used. * * @tag QGramIndexFibres#QGramSADir * * @brief The union of suffix array and directory. * * In most applications a q-gram index consisting of both of these table is * required. To efficiently create them at once use this tag for @link * Index#indexRequire @endlink or @link Index#indexCreate @endlink. * * @tag QGramIndexFibres#QGram_RawText * * @brief The concatenation of all text sequences. * * <tt>QGramText</tt> and <tt>QGram_RawText</tt> fibres are equal by default. * They differ if the index text is a set of strings. Then, raw text is the * concatenation of all strings in this set. * * @tag QGramIndexFibres#QGramCounts * * @brief The counts array. * * Contains the numbers of occurrences per sequence of each q-gram, s.t. the * numbers of the same q-gram are stored in a contiguous block (q-gram count * bucket). A bucket contains entries (seqNo,count) of sequences with at least * one q-gram occurrence. q-grams exceeding the end of the text are ignored. The * beginning of each count bucket can be determined by the q-gram counts * directory (<tt>QGramCountsDir</tt>, see below). * * @link Fibre @endlink returns a @link String @endlink over the alphabet of the * @link SAValue @endlink of <tt>TIndex</tt>. * * @tag QGramIndexFibres#QGramSA * * @brief The suffix array. * * Contains all occurrences of q-grams, s.t. the occurrences of a single q-gram * are stored in a contiguous block (q-gram bucket). q-grams exceeding the end * of the text are ignored. The beginning of each bucket can be determined by * the q-gram directory (<tt>QGramDir</tt>, see below). * * It corresponds to a suffix array which is sorted by the first q-gram. * * @link Fibre @endlink returns a @link String @endlink over the alphabet of the * @link SAValue @endlink of <tt>TIndex</tt>. */