SeqAn3 3.4.0rc.1
The Modern C++ library for sequence analysis.

Search related views. More...
Classes  
struct  seqan3::seed 
strong_type for seed. More...  
struct  seqan3::window_size 
strong_type for the window_size. More...  
Variables  
constexpr auto  seqan3::views::kmer_hash 
Computes hash values for each position of a range via a given shape.  
constexpr auto  seqan3::views::minimiser 
Computes minimisers for a range of comparable values. A minimiser is the smallest value in a window.  
Alphabet related views  
constexpr auto  seqan3::views::minimiser_hash 
Computes minimisers for a range with a given shape, window size and seed.  
Search related views.

inlineconstexpr 
Computes hash values for each position of a range via a given shape.
urng_t  The type of the range being processed. See below for requirements. [template parameter is omitted in pipe notation] 
[in]  urange  The range being processed. [parameter is omitted in pipe notation] 
[in]  shape  The seqan3::shape that determines how to compute the hash value. 
urange
and the number of 1s \(s\) of shape
it must hold that \(s>\frac{64}{\log_2\sigma}\), i.e. hashes resulting from the shape/alphabet combination can be represented in an uint64_t
.Concepts and traits  urng_t (underlying range type)  rrng_t (returned range type) 

std::ranges::input_range  required  preserved 
std::ranges::forward_range  required  preserved 
std::ranges::bidirectional_range  preserved  
std::ranges::random_access_range  preserved  
std::ranges::contiguous_range  lost  
std::ranges::viewable_range  required  guaranteed 
std::ranges::view  guaranteed  
std::ranges::sized_range  preserved  
std::ranges::common_range  preserved  
std::ranges::output_range  lost  
seqan3::const_iterable_range  preserved  
std::ranges::range_reference_t  seqan3::semialphabet  std::size_t 
See the views submodule documentation for detailed descriptions of the view properties.

inlineconstexpr 
Computes minimisers for a range of comparable values. A minimiser is the smallest value in a window.
urng_t  The type of the first range being processed. See below for requirements. [template parameter is omitted in pipe notation] 
[in]  urange1  The range being processed. [parameter is omitted in pipe notation] 
[in]  window_size  The number of values in one window. 
A minimiser is the smallest value in a window. For example for the following list of hash values [28, 100, 9, 23, 4, 1, 72, 37, 8]
and 4 as window_size
, the minimiser values are [9, 4, 1]
.
The minimiser can be calculated for one given range or for two given ranges, where the minimizer is the smallest value in both windows. For example for the following list of hash values [28, 100, 9, 23, 4, 1, 72, 37, 8]
and [30, 2, 11, 101, 199, 73, 34, 900]
and 4 as window_size
, the minimiser values are [2, 4, 1]
.
Note that in the interface with the second underlying range the constiterable property will only be preserved if both underlying ranges are constiterable.
In case there are multiple minimal values within one window, the minimum and therefore the minimiser is ambiguous. We choose the rightmost value as the minimiser of the window, and when shifting the window, the minimiser is only changed if there appears a value that is strictly smaller than the current minimum. This approach is termed robust winnowing by Chirag et al. and is proven to work especially well on repeat regions.
Concepts and traits  urng_t (underlying range type)  rrng_t (returned range type) 

std::ranges::input_range  required  preserved 
std::ranges::forward_range  required  preserved 
std::ranges::bidirectional_range  lost  
std::ranges::random_access_range  lost  
std::ranges::contiguous_range  lost  
std::ranges::viewable_range  required  guaranteed 
std::ranges::view  guaranteed  
std::ranges::sized_range  lost  
std::ranges::common_range  lost  
std::ranges::output_range  lost  
seqan3::const_iterable_range  preserved  
std::ranges::range_reference_t  std::totally_ordered  std::totally_ordered 
See the views submodule documentation for detailed descriptions of the view properties.

inlineconstexpr 
Computes minimisers for a range with a given shape, window size and seed.
urng_t  The type of the range being processed. See below for requirements. [template parameter is omitted in pipe notation] 
[in]  urange  The range being processed. [parameter is omitted in pipe notation] 
[in]  shape  The seqan3::shape that determines how to compute the hash value. 
[in]  window_size  The window size to use. 
[in]  seed  The seed used to skew the hash values. Default: 0x8F3F73B5CF1C9ADE. 
size_t
where each value is the minimiser of the resp. window. See below for the properties of the returned range.A sequence can be presented by a small number of kmers (minimisers). For a given shape and window size all kmers are determined in the forward strand and the backward strand and only the lexicographically smallest kmer is returned for one window. This process is repeated over every possible window of a sequence. If consecutive windows share a minimiser, it is saved only once. For example, in the sequence "TAAAGTGCTAAA" for an ungapped shape of length 3 and a window size of 5 the first, the second and the last window contain the same minimiser "AAA". Because the minimisers of the first two consecutive windows also share the same position, storing this minimiser twice is redundant and it is stored only once. The "AAA" minimiser of the last window on the other hand is stored, since it is located at an other position than the previous "AAA" minimiser and hence storing the second "AAA"minimiser is not redundant but necessary.
It might happen that a minimiser changes only slightly when sliding the window over the sequence. For instance, when a minimiser starts with a repetition of A’s, then in the next window it is highly likely that the minimiser will start with a repetition of A’s as well. Because it is only one A shorter, depending on how long the repetition is this might go on for multiple window shifts. Saving these only slightly different minimiser makes no sense because they contain no new information about the underlying sequence. Additionally, sequences with a repetition of A’s will be seen as more similar to each other than they actually are. As Marçais et al. have shown, randomizing the order of the kmers can solve this problem. Therefore, a random seed is used to XOR all kmers, thereby randomizing the order. The user can change the seed to any other value he or she thinks is useful. A seed of 0 is returning the lexicographical order.
Concepts and traits  urng_t (underlying range type)  rrng_t (returned range type) 

std::ranges::input_range  required  preserved 
std::ranges::forward_range  required  preserved 
std::ranges::bidirectional_range  lost  
std::ranges::random_access_range  lost  
std::ranges::contiguous_range  lost  
std::ranges::viewable_range  required  guaranteed 
std::ranges::view  guaranteed  
std::ranges::sized_range  lost  
std::ranges::common_range  lost  
std::ranges::output_range  lost  
seqan3::const_iterable_range  preserved  
std::ranges::range_reference_t  seqan3::semialphabet  std::size_t 
See the views submodule documentation for detailed descriptions of the view properties.