Class Shape
Stores hash value and shape for an ungapped or gapped q-gram.

All Subcl's GappedShape, GenericShape, HardwiredShape, OneGappedShape, SimpleShape, UngappedShape
Defined in <seqan/index.h>
Signature template <typename TValue, typename TSpec> class Shape;

Template Parameters

TSpec The specializing type. Default: SimpleShape, for ungapped q-grams.
TValue The Value type of the string the shape is applied to (e.g. Dna).

Interface Function Overview

Interface Metafunction Overview

Detailed Description

The ValueSize of Shape is the ValueSize of TValue which is the alphabet size.

To get the span or the weight of a shape call length or weight.

Interface Functions Detail

TValue hash(shape, it[, charsLeft]);

Computes a (lower) hash value for a shape applied to a sequence.

Parameters

shape Shape to be used for hashing. Types: Shape
it Sequence iterator pointing to the first character of the shape.
charsLeft The distance of it to the string end. If charsLeft is smaller than the shape's span, the hash value corresponds to the smallest shape beginning with charsLeft characters.

Returns

TValue Hash value of the shape (Metafunction: Value).

See Also

TValue hash2(shape, it, charsLeft);

Computes an unique hash value of a shape applied to a sequence, even if the sequence is shorter than the shape span.

Parameters

shape Shape to be used for hashing. Types: Shape
it Sequence iterator pointing to the first character of the shape.
charsLeft The distance of it to the string end.

Returns

TValue Hash value of the shape (Metafunction: Value).

See Also

TValue hash2Next(shape, it);

Computes a unique hash value for the adjacent shape, even if it is shorter than q.

Parameters

shape Shape to be used for hashing the q-gram. Types: Shape
it Sequence iterator pointing to the first character of the adjacent shape.

Returns

TValue Hash value of the shape (Metafunction: Value).

hash has to be called before with shape on the left adjacent q-gram.

See Also

TValue hash2Upper(shape, it, charsLeft);

Computes an upper unique hash value of a shape applied to a sequence, even if the sequence is shorter than the shape span.

Parameters

shape Shape to be used for hashing. Types: Shape
it Sequence iterator pointing to the first character of the shape.
charsLeft The distance of it to the string end.

Returns

TValue Upper hash value of the shape. The hash value corresponds to the maximal hash2 value of a shape beginning with the min(charsLeft,length(shape)) characters + 1 (Metafunction: Value).

This function in conjunction with hash2 is useful to search a q-gram index for p-grams with p < q.

See Also

void hashInit(shape, it);

Preprocessing step of a pure hashNext loop.

Parameters

shape Shape to be used for hasing.
it The iterator to use for initializing the shape.

Overlapping q-grams can efficiently be hashed by calling hash on the first text position and hashNext on succeeding, adjacent positions. One drawback of this scenario is that for-loops cannot start with the first position directly and become more complicated. As a remedy, hashInit was introduced which initializes the Shape to be used with hashNext on the first position directly.

Example

Two hash loop examples. The first loop uses hash/hashNext while the second use hashInit/hashNext and can process all hashes within the loop.

#include <seqan/sequence.h>
#include <seqan/index.h>

using namespace seqan;

int main ()
{
    DnaString text = "AAAACACAGTTTGA";
    Shape<Dna, UngappedShape<3> > myShape;

    // loop using hash() and hashNext() starts at position 1
    std::cout << hash(myShape, begin(text)) << '\t';
    for (unsigned i = 1; i < length(text) - length(myShape) + 1; ++i)
        std::cout << hashNext(myShape, begin(text) + i) << '\t';
    std::cout << std::endl;

    // loop using hashInit() and hashNext() starts at position 0
    hashInit(myShape, begin(text));
    for (unsigned i = 0; i < length(text) - length(myShape) + 1; ++i)
        std::cout << hashNext(myShape, begin(text) + i) << '\t';
    std::cout << std::endl;

    return 0;
}
0	0	1	4	17	4	18	11	47	63	62	56	
0	0	1	4	17	4	18	11	47	63	62	56

TValue hashNext(shape, it);

Computes the hash value for the adjacent shape.

Parameters

shape Shape to be used for hashing. Types: Shape
it Sequence iterator pointing to the first character of the adjacent shape.

Returns

TValue Hash value of the q-gram (Metafunction: Value).

hash has to be called before.

See Also

TValue hashUpper(shape, it, charsLeft);

Computes an upper hash value for a shape applied to a sequence.

Parameters

shape Shape to be used for hashing. Types: Shape
it Sequence iterator pointing to the first character of the shape.
charsLeft The distance of it to the string end.

Returns

TValue Upper hash value of the shape. The hash value corresponds to the maximal hash value of a shape beginning with min(charsLeft,length(shape)) characters + 1 (Metafunction: Value).

This function in conjunction with hash is useful to search a q-gram index for p-grams with p < q.

See Also

TSize length(shape);

Returns the number of elements of the shape (span).

Parameters

shape Shape object for which the number of relevant positions is determined.

Returns

TSize The number of elements of the shape (span) (Metafunction: Size).

TSize resize(shape, length)

Resize a shape to a specified span.

Parameters

shape Shape object for which the number of relevant positions is determined
length The new length (span) of the shape.

Returns

TSize The new span of type (Metafunction: Size).

void shapeToString(bitmap, shape);

Converts a given shape into a sequence of '1' (relevant position) and '0' (irrelevant position).

Parameters

bitmap The resulting sequence object. Types: String
shape Shape object. Types: Shape

See Also

bool stringToShape(shape, bitmap);

Takes a shape given as a string of '1' (relevant position) and '0' (irrelevant position) and converts it into a Shape object.

Parameters

shape Shape object that is manipulated.
bitmap A character string of '1' and '0' representing relevant and irrelevant positions (blanks) respectively. This string must begin with a '1'. Trailing '0's are ignored. If shape is a SimpleShape at most one contiguous sequences of 1s is allowed. If shape is a OneGappedShape at most two contiguous sequences of '1's are allowed (String of char).

Returns

bool true if the conversion was successful.

See Also

void unhash(result, hash, q);

Inverse of the hash function; for ungapped shapes.

Parameters

result String to write the result to. Types: String.
hash The hash value previously computed with hash.
q The q-gram length. Types: unsigned

See Also

TValue value(shape);

Returns the current hash value of the Shape.

Parameters

shape The Shape to query for its value.

Returns

TValue The hash value of the shape.

TSize weight(shape);

Number of relevant positions in a shape.

Parameters

shape Shape object for which the number of relevant positions is determined.

Returns

TSize Number of relevant positions (Metafunction: Size).

For ungapped shapes the return value is the result of the length function. For gapped shapes this is the number of '1's.

Interface Metafunctions Detail

Host<TShape>::Type;

Returns the host (= value) type to use.

Template Parameters

TShape The Shape to query for host (= value) type.

Returns

Type Type to use for the host (= value) size.

LENGTH<TShape>::VALUE;

Returns the length (span) of a shape.

Template Parameters

TShape The Shape to query for its length (span).

Returns

VALUE The length (span) of the shape.

Size<TShape>::Type;

Returns the size type for a shape.

Template Parameters

TShape The Shape to query for its size type.

Returns

Type The size type of the shape.

Value<TShape>::Type;

Returns the value type for a shape.

Template Parameters

TShape The Shape to query for its value type.

Returns

Type The value type of the shape.

ValueSize<TShape>::Type;

Returns the type to use for the value size.

Template Parameters

TShape The Shape to query for value size type.

Returns

Type Type to use for the value size.

WEIGHT<TShape>::VALUE;

Returns the weight (number of 1's) of a shape.

Template Parameters

TShape The Shape to query for its weight (number of 1's).

Returns

VALUE The weight (number of 1's) of the shape.