Function
hash
Computes a (lower) hash value for a shape applied to a sequence.
The hash value (a.k.a. code) of a q-gram is the lexicographical rank of this q-gram in the set of all possible q-grams. For example, the hash value of the Dna 3-gram AAG is 2 as there are only two 3-grams (AAA and AAC) having a smaller lexicographical rank. If hash is called with a gapped shape, the q-gram is the text subsequence of no-gap shape positions relative to the text iterator, e.g. a shape 1101 at the beginning of text ACGT corresponds to the 3-gram ACT.
hash(shape, it)
hash(shape, it, charsLeft)
seqan/index.h
Parameters
 shape Shape to be used for hashing.Types: Shape it Sequence iterator pointing to the first character of the shape. charsLeft The distance of it to the string end. If charsLeft is smaller than the shape's span, the hash value corresponds to the lexicographically smallest shape beginning with charsLeft characters. The hash value of such a truncated shape corresponds to the shape applied to a text padded with the smallest alphabet characters.
Return Values
Hash value of the shape.
Member of
Examples
Code example that computes hash values of 4-grams with different shapes starting at the beginning of a text.
 1 #include  2 #include  3 4 using namespace seqan; 5 6 int main () 7 { 8 DnaString text = "GATTACA"; 9 // output all hash values as hexadecimal numbers 10 std::cout << std::hex; 11 12 // 4-gram with shape 1111 at position 0 is GATT 13 // its hash value is 0b10001111 = 0x8f 14 Shape > shape1; 15 std::cout << "0x" << hash(shape1, begin(text)) << std::endl; 16 17 // 4-gram with shape 110101 at position 0 is GATC 18 // its hash value is 0b10001101 = 0x8d 19 Shape shape2; 20 stringToShape(shape2, "110101"); 21 std::cout << "0x" << hash(shape2, begin(text)) << std::endl; 22 23 // 4-gram with shape 11011 at position 0 is GATA 24 // the hash value is 0b10001100 = 0x8c 25 Shape shape3; 26 stringToShape(shape2, "11011"); 27 std::cout << "0x" << hash(shape2, begin(text)) << std::endl; 28 29 return 0; 30 }
The resulting hexadecimal hash values of the three 4-mers GATT, GATC and GATA are:
0x8f
0x8d
0x8c