Function
hash
Computes a (lower) hash value for a shape applied to a sequence.
The hash value (a.k.a. code) of a q-gram is the lexicographical rank of this q-gram in the set of all possible q-grams. For example, the hash value of the Dna 3-gram AAG is 2 as there are only two 3-grams (AAA and AAC) having a smaller lexicographical rank. If hash is called with a gapped shape, the q-gram is the text subsequence of no-gap shape positions relative to the text iterator, e.g. a shape 1101 at the beginning of text ACGT corresponds to the 3-gram ACT.
hash(shape, it)
hash(shape, it, charsLeft)
Include Headers
seqan/index.h
Parameters
shape
Shape to be used for hashing.
Types: Shape
it
Sequence iterator pointing to the first character of the shape.
charsLeft
The distance of it to the string end. If charsLeft is smaller than the shape's span, the hash value corresponds to the lexicographically smallest shape beginning with charsLeft characters. The hash value of such a truncated shape corresponds to the shape applied to a text padded with the smallest alphabet characters.
Return Values
Hash value of the shape.
Member of
Examples
Code example that computes hash values of 4-grams with different shapes starting at the beginning of a text.
1#include <seqan/sequence.h>
2#include <seqan/index.h>
3
4using namespace seqan;
5
6int main ()
7{
8    DnaString text = "GATTACA";
9    // output all hash values as hexadecimal numbers
10    std::cout << std::hex;
11
12    // 4-gram with shape 1111 at position 0 is GATT
13    // its hash value is 0b10001111 = 0x8f
14    Shape<Dna, UngappedShape<4> > shape1;
15    std::cout << "0x" << hash(shape1, begin(text)) << std::endl;
16
17    // 4-gram with shape 110101 at position 0 is GATC
18    // its hash value is 0b10001101 = 0x8d
19    Shape<Dna, GenericShape> shape2;
20    stringToShape(shape2, "110101");
21    std::cout << "0x" << hash(shape2, begin(text)) << std::endl;
22
23    // 4-gram with shape 11011 at position 0 is GATA
24    // the hash value is 0b10001100 = 0x8c
25    Shape<Dna, OneGappedShape> shape3;
26    stringToShape(shape2, "11011");
27    std::cout << "0x" << hash(shape2, begin(text)) << std::endl;
28
29    return 0;
30}
The resulting hexadecimal hash values of the three 4-mers GATT, GATC and GATA are:
0x8f
0x8d
0x8c
SeqAn - Sequence Analysis Library - www.seqan.de
 

Page built @2013/07/11 09:12:36