SeqAn3 3.4.0-rc.1
The Modern C++ library for sequence analysis.
|
Search related views. More...
Classes | |
struct | seqan3::seed |
strong_type for seed. More... | |
struct | seqan3::window_size |
strong_type for the window_size. More... | |
Variables | |
constexpr auto | seqan3::views::kmer_hash |
Computes hash values for each position of a range via a given shape. | |
constexpr auto | seqan3::views::minimiser |
Computes minimisers for a range of comparable values. A minimiser is the smallest value in a window. | |
Alphabet related views | |
constexpr auto | seqan3::views::minimiser_hash |
Computes minimisers for a range with a given shape, window size and seed. | |
Search related views.
|
inlineconstexpr |
Computes hash values for each position of a range via a given shape.
urng_t | The type of the range being processed. See below for requirements. [template parameter is omitted in pipe notation] |
[in] | urange | The range being processed. [parameter is omitted in pipe notation] |
[in] | shape | The seqan3::shape that determines how to compute the hash value. |
urange
and the number of 1s \(s\) of shape
it must hold that \(s \le \frac{64}{\log_2\sigma}\), i.e. hashes resulting from the shape/alphabet combination can be represented in an uint64_t
.Concepts and traits | urng_t (underlying range type) | rrng_t (returned range type) |
---|---|---|
std::ranges::input_range | required | preserved |
std::ranges::forward_range | required | preserved |
std::ranges::bidirectional_range | preserved | |
std::ranges::random_access_range | preserved | |
std::ranges::contiguous_range | lost | |
std::ranges::viewable_range | required | guaranteed |
std::ranges::view | guaranteed | |
std::ranges::sized_range | preserved | |
std::ranges::common_range | preserved | |
std::ranges::output_range | lost | |
seqan3::const_iterable_range | preserved | |
std::ranges::range_reference_t | seqan3::semialphabet | std::size_t |
See the views submodule documentation for detailed descriptions of the view properties.
|
inlineconstexpr |
Computes minimisers for a range of comparable values. A minimiser is the smallest value in a window.
urng_t | The type of the first range being processed. See below for requirements. [template parameter is omitted in pipe notation] |
[in] | urange1 | The range being processed. [parameter is omitted in pipe notation] |
[in] | window_size | The number of values in one window. |
A minimiser is the smallest value in a window. For example for the following list of hash values [28, 100, 9, 23, 4, 1, 72, 37, 8]
and 4 as window_size
, the minimiser values are [9, 4, 1]
.
The minimiser can be calculated for one given range or for two given ranges, where the minimizer is the smallest value in both windows. For example for the following list of hash values [28, 100, 9, 23, 4, 1, 72, 37, 8]
and [30, 2, 11, 101, 199, 73, 34, 900]
and 4 as window_size
, the minimiser values are [2, 4, 1]
.
Note that in the interface with the second underlying range the const-iterable property will only be preserved if both underlying ranges are const-iterable.
In case there are multiple minimal values within one window, the minimum and therefore the minimiser is ambiguous. We choose the rightmost value as the minimiser of the window, and when shifting the window, the minimiser is only changed if there appears a value that is strictly smaller than the current minimum. This approach is termed robust winnowing by Chirag et al. and is proven to work especially well on repeat regions.
Concepts and traits | urng_t (underlying range type) | rrng_t (returned range type) |
---|---|---|
std::ranges::input_range | required | preserved |
std::ranges::forward_range | required | preserved |
std::ranges::bidirectional_range | lost | |
std::ranges::random_access_range | lost | |
std::ranges::contiguous_range | lost | |
std::ranges::viewable_range | required | guaranteed |
std::ranges::view | guaranteed | |
std::ranges::sized_range | lost | |
std::ranges::common_range | lost | |
std::ranges::output_range | lost | |
seqan3::const_iterable_range | preserved | |
std::ranges::range_reference_t | std::totally_ordered | std::totally_ordered |
See the views submodule documentation for detailed descriptions of the view properties.
|
inlineconstexpr |
Computes minimisers for a range with a given shape, window size and seed.
urng_t | The type of the range being processed. See below for requirements. [template parameter is omitted in pipe notation] |
[in] | urange | The range being processed. [parameter is omitted in pipe notation] |
[in] | shape | The seqan3::shape that determines how to compute the hash value. |
[in] | window_size | The window size to use. |
[in] | seed | The seed used to skew the hash values. Default: 0x8F3F73B5CF1C9ADE. |
size_t
where each value is the minimiser of the resp. window. See below for the properties of the returned range.A sequence can be presented by a small number of k-mers (minimisers). For a given shape and window size all k-mers are determined in the forward strand and the backward strand and only the lexicographically smallest k-mer is returned for one window. This process is repeated over every possible window of a sequence. If consecutive windows share a minimiser, it is saved only once. For example, in the sequence "TAAAGTGCTAAA" for an ungapped shape of length 3 and a window size of 5 the first, the second and the last window contain the same minimiser "AAA". Because the minimisers of the first two consecutive windows also share the same position, storing this minimiser twice is redundant and it is stored only once. The "AAA" minimiser of the last window on the other hand is stored, since it is located at an other position than the previous "AAA" minimiser and hence storing the second "AAA"-minimiser is not redundant but necessary.
It might happen that a minimiser changes only slightly when sliding the window over the sequence. For instance, when a minimiser starts with a repetition of A’s, then in the next window it is highly likely that the minimiser will start with a repetition of A’s as well. Because it is only one A shorter, depending on how long the repetition is this might go on for multiple window shifts. Saving these only slightly different minimiser makes no sense because they contain no new information about the underlying sequence. Additionally, sequences with a repetition of A’s will be seen as more similar to each other than they actually are. As Marçais et al. have shown, randomizing the order of the k-mers can solve this problem. Therefore, a random seed is used to XOR all k-mers, thereby randomizing the order. The user can change the seed to any other value he or she thinks is useful. A seed of 0 is returning the lexicographical order.
Concepts and traits | urng_t (underlying range type) | rrng_t (returned range type) |
---|---|---|
std::ranges::input_range | required | preserved |
std::ranges::forward_range | required | preserved |
std::ranges::bidirectional_range | lost | |
std::ranges::random_access_range | lost | |
std::ranges::contiguous_range | lost | |
std::ranges::viewable_range | required | guaranteed |
std::ranges::view | guaranteed | |
std::ranges::sized_range | lost | |
std::ranges::common_range | lost | |
std::ranges::output_range | lost | |
seqan3::const_iterable_range | preserved | |
std::ranges::range_reference_t | seqan3::semialphabet | std::size_t |
See the views submodule documentation for detailed descriptions of the view properties.