SeqAn3 3.4.0-rc.1
The Modern C++ library for sequence analysis.
Loading...
Searching...
No Matches
Search

Data structures and approximate string search algorithms for large collection of text (e.g. DNA). More...

+ Collaboration diagram for Search:

Modules

 Configuration
 Data structures and utility functions for configuring search algorithm.
 
 DREAM Index
 Provides seqan3::interleaved_bloom_filter.
 
 FM Index
 Provides seqan3::fm_index and seqan3::bi_fm_index as well as respective cursors.
 
 k-mer Index
 Implementation of shapes for a k-mer Index.
 
 Views
 Search related views.
 

Classes

struct  seqan3::detail::policy_max_error
 Provides the function max_error_counts if inherited by a search algorithm. More...
 
struct  seqan3::detail::policy_search_result_builder< search_configuration_t >
 Provides the function make_results if inherited by a search algorithm. More...
 
struct  seqan3::detail::search< nbr_blocks >
 Object storing information for a search (of a search scheme). More...
 
struct  seqan3::detail::search_configuration_validator
 Class used to validate the search configuration. More...
 
class  seqan3::detail::search_configurator
 Class used to update the search configuration, e.g. add defaults. More...
 
struct  seqan3::detail::search_dyn
 Object storing information for a search (of a search scheme). More...
 
struct  seqan3::detail::search_param
 Object grouping numbers of errors for different kind of error types. More...
 
class  seqan3::search_result< query_id_type, cursor_type, reference_id_type, reference_begin_position_type >
 The result class generated by the seqan3::seach algorithm. More...
 
class  seqan3::detail::search_scheme_algorithm< configuration_t, index_t, policies_t >
 The algorithm that performs a bidirectional search on a bidirectional FM index using (optimal) search schemes. More...
 
struct  seqan3::detail::search_traits< search_configuration_t >
 A collection of traits extracted from the search configuration. More...
 
class  seqan3::detail::unidirectional_search_algorithm< configuration_t, index_t, policies_t >
 The algorithm that performs a unidirectional search on an FM index using trivial backtracking. More...
 

Typedefs

using seqan3::detail::search_scheme_dyn_type = std::vector< search_dyn >
 Type for storing search schemes. Number of blocks do not have to be known at compile time.
 
template<uint8_t nbr_searches, uint8_t nbr_blocks>
using seqan3::detail::search_scheme_type = std::array< search< nbr_blocks >, nbr_searches >
 Type for storing search schemes. Number of blocks have to be known at compile time.
 

Enumerations

enum class  seqan3::detail::error_type : uint8_t { seqan3::detail::error_type::deletion , seqan3::detail::error_type::insertion , seqan3::detail::error_type::matchmm , seqan3::detail::error_type::none }
 An enumerator for the different error types used during the backtracking. More...
 

Functions

std::vector< search_dynseqan3::detail::compute_ss (uint8_t const min_error, uint8_t const max_error)
 Computes a (non-optimal) search scheme. Currently the generated search scheme represents trivial backtracking.
 
template<typename index_t , std::ranges::forward_range queries_t, typename configuration_t = decltype(search_cfg::default_configuration)>
requires std::ranges::forward_range<std::ranges::range_reference_t<queries_t>> && std::same_as<range_innermost_value_t<queries_t>, typename index_t::alphabet_type>
auto seqan3::search (queries_t &&queries, index_t const &index, configuration_t const &cfg=search_cfg::default_configuration)
 Search a query or a range of queries in an index.
 
template<bool abort_on_hit, typename query_t , typename delegate_t >
requires (template_specialisation_of<typename index_t::cursor_type, bi_fm_index_cursor>)
void seqan3::detail::search_scheme_algorithm< configuration_t, index_t, policies_t >::search_algo_bi (query_t &query, search_param const error_left, delegate_t &&delegate)
 Searches a query sequence in a bidirectional index.
 
template<typename search_scheme_t >
auto seqan3::detail::search_scheme_block_info (search_scheme_t const &search_scheme, size_t const query_length)
 Returns for each search the cumulative length of blocks in the order of blocks in each search and the starting position of the first block in the query sequence.
 
template<bool abort_on_hit, typename cursor_t , typename query_t , typename search_t , typename blocks_length_t , typename delegate_t >
bool seqan3::detail::search_ss (cursor_t cur, query_t &query, typename cursor_t::size_type const lb, typename cursor_t::size_type const rb, uint8_t const errors_spent, uint8_t const block_id, bool const go_right, search_t const &search, blocks_length_t const &blocks_length, search_param const error_left, delegate_t &&delegate)
 Searches a query sequence in a bidirectional index using a single search of a search schemes.
 
template<bool abort_on_hit, typename index_t , typename query_t , typename search_scheme_t , typename delegate_t >
void seqan3::detail::search_ss (index_t const &index, query_t &query, search_param const error_left, search_scheme_t const &search_scheme, delegate_t &&delegate)
 Searches a query sequence in a bidirectional index using search schemes.
 
template<bool abort_on_hit, typename cursor_t , typename query_t , typename search_t , typename blocks_length_t , typename delegate_t >
bool seqan3::detail::search_ss_children (cursor_t cur, query_t &query, typename cursor_t::size_type const lb, typename cursor_t::size_type const rb, uint8_t const errors_spent, uint8_t const block_id, bool const go_right, uint8_t const min_error_left_in_block, search_t const &search, blocks_length_t const &blocks_length, search_param const error_left, delegate_t &&delegate)
 Searches a query sequence in a bidirectional index using a single search of a search schemes. Sub-function for approximate search step (iterating over all children in a conceptual suffix tree).
 
template<bool abort_on_hit, typename cursor_t , typename query_t , typename search_t , typename blocks_length_t , typename delegate_t >
bool seqan3::detail::search_ss_deletion (cursor_t cur, query_t &query, typename cursor_t::size_type const lb, typename cursor_t::size_type const rb, uint8_t const errors_spent, uint8_t const block_id, bool const go_right, search_t const &search, blocks_length_t const &blocks_length, search_param const error_left, delegate_t &&delegate)
 Searches a query sequence in a bidirectional index using a single search of a search schemes. Sub-function for deletions at the end of a block.
 
template<bool abort_on_hit, typename cursor_t , typename query_t , typename search_t , typename blocks_length_t , typename delegate_t >
bool seqan3::detail::search_ss_exact (cursor_t cur, query_t &query, typename cursor_t::size_type const lb, typename cursor_t::size_type const rb, uint8_t const errors_spent, uint8_t const block_id, bool const go_right, search_t const &search, blocks_length_t const &blocks_length, search_param const error_left, delegate_t &&delegate)
 Searches a query sequence in a bidirectional index using a single search of a search scheme. Sub-function for searching the remaining part of the current block without any errors.
 
template<bool abort_on_hit, typename query_t >
bool seqan3::detail::unidirectional_search_algorithm< configuration_t, index_t, policies_t >::search_trivial (typename index_t::cursor_type cur, query_t &query, typename index_t::cursor_type::size_type const query_pos, search_param const error_left, error_type const prev_error)
 Searches a query sequence in an index using trivial backtracking.
 

Variables

template<uint8_t min_error, uint8_t max_error>
constexpr int seqan3::detail::optimum_search_scheme {0}
 Search scheme that is optimal in the running time for the specified lower and upper error bound.
 

Detailed Description

Data structures and approximate string search algorithms for large collection of text (e.g. DNA).

Searching is a key component in many sequence analysis tools. The search module is a powerful and easy way to search sequences in a large text or an arbitrary nested collection of texts. When it comes to searching, indices are a core component for searching large amounts of data and are used for tools such as read mappers, assemblers or protein search tools.

SeqAn currently implements only the FM index and a k-mer index is planned. The FM index works with arbitrary pattern lengths and error numbers.

Todo:

Elaborate on that (space consumption for growing k, maybe a rule of thumb).

Add a description of the DREAM index here.

Search algorithm: seqan3::search

The Search module offers a unified search interface seqan3::search. The function chooses the best search method based on the provided index and an optional configuration.

seqan3::search(queries, index, config);
auto search(queries_t &&queries, index_t const &index, configuration_t const &cfg=search_cfg::default_configuration)
Search a query or a range of queries in an index.
Definition search.hpp:105

The search configuration

The seqan3::search algorithm can be configured in multiple ways. You simply pass a respective config object to the algorithm:

The search configuration is independent of the index you use. It specifies for example how many mismatches and indels are allowed to consider your search successful.

See detailed documentation on Configuration for details.

The search result

The result is a range over "hits". We call a hit the position of a successful search. A hit in SeqAn is represented by the seqan3::search_result. You can configure what's part of the search result with the 4. Output Configuration.

You can iterate over the results with a normal for loop:

for (auto && result : results)
seqan3::debug_stream << result << '\n';
The main SeqAn3 namespace.
Definition aligned_sequence_concept.hpp:26

Which index do I use?

In order to choose the right Index for your use case, have a look at the next section (Available Indices for the seqan3::search). The documentation of the respective index will lead you to a more detailed description on how to use our search algorithm.

Available Indices for the seqan3::search

seqan3::fm_index

The seqan3::fm_index implements an original FM Index with a trivial backtracking approach. It is recommended for searches with no errors (exact searches). For exact searches, the original FM index is slightly faster than the bidirectional FM Index, and in general, it is only half the size.

seqan3::bi_fm_index

The seqan3::bi_fm_index is a bidirectional FM index [1]. It improves the original FM index by allowing to extend the query to the right and the left. This makes the bidirectional FM index more efficient than its predecessor when searching with errors (approximate search). The performance gain is enabled by using (optimum) search schemes. Currently, using search schemes is only supported for searches with up to three errors and falls back to a trivial backtracking approach for higher errors. In the future, we plan to improve the search schemes to handle higher error counts.

Reference:

[1] Optimum Search Schemes for Approximate String Matching Using Bidirectional FM-Index. bioRxiv, 301085. https://doi.org/10.1101/301085, Kianfar, K., Pockrandt, C., Torkamandi, B., Luo, H., & Reinert, K. (2018).

Implementation details of Search Schemes

A search scheme is a collection of searches, where each search s specifies the order in which the blocks are searched (s.pi), the lower error bounds (s.l) and the upper error bounds (s.u). If the number of blocks that the query sequences are split into are known at compile time, the data structure seqan3::detail::search is recommended, otherwise one has to use seqan3::detail::search_dyn. The first one implements its member variables as an a std::array of integers, the latter as a std::vector of integers. Search search schemes are defined similiar. They are either implemented as a std::array of searches if the number of searches is known at compile time, or as a std::vector if not.

Precomputed optimum search schemes are represented as a std::array of seqan3::detail::search since both the number of searches and the number of blocks are known at compile time. Search schemes computed at run time are represented as std::vector of seqan3::detail::search_dyn.

All optimum search schemes are disjoint, i.e. no error configuration is covered by more than one search. Thus there is no need for a filtration phase after searching to remove duplicate hits when searching with hamming distance. Allowing insertions and deletions will, however, lead to redundant reports of hits regardless of search schemes.

Membership queries (lookups)

If you are not interested in the exact match of your query, but simply whether your query can be found or not (also called membership query or lookup), SeqAn offers efficient data structures for this purpose. Of course you could also use a simple hash map (e.g., std::unordered_map) but depending on your data this can quickly become infeasible in terms of memory consumption and runtime.

Take a look at the seqan3::interleaved_bloom_filter. The following example shows how to use it to simultaneously query all regions of a genome for the occurrence of a specific query:

// SPDX-FileCopyrightText: 2006-2024 Knut Reinert & Freie Universität Berlin
// SPDX-FileCopyrightText: 2016-2024 Knut Reinert & MPI für molekulare Genetik
// SPDX-License-Identifier: CC0-1.0
using namespace seqan3::literals;
int main()
{
auto genome = "TTTTTTTTTTAAAAAAAAAATTTTTTTTTTGGGGGGGGGG"_dna4;
// Fill buckets of the interleaved bloomm filter
auto genome_buckets = genome | seqan3::views::chunk(10); // divide genome into buckets of size 10
size_t bucket_idx{0};
for (auto bucket : genome_buckets)
{
for (auto kmer : bucket | seqan3::views::kmer_hash(seqan3::ungapped{2})) // hash genome with k = 2
ibf.emplace(kmer, seqan3::bin_index{bucket_idx});
++bucket_idx;
}
auto ibf_agent = ibf.counting_agent(); // the membership_agent enables efficient kemr queries
auto query = "TTT"_dna4;
auto query_kmers = query | seqan3::views::kmer_hash(seqan3::ungapped{2}); // hash query with k = 2
seqan3::debug_stream << ibf_agent.bulk_count(query_kmers) << '\n'; // prints [2,0,2,0]
}
Provides seqan3::views::chunk.
The IBF binning directory. A data structure that efficiently answers set-membership queries for multi...
Definition interleaved_bloom_filter.hpp:131
Provides seqan3::debug_stream and related types.
Provides seqan3::dna4, container aliases and string literals.
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition debug_stream.hpp:37
constexpr auto kmer_hash
Computes hash values for each position of a range via a given shape.
Definition kmer_hash.hpp:766
seqan::stl::views::chunk chunk
A view adaptor that divides a range into chunks. <dl class="no-api">This entity is not part of the Se...
Definition chunk.hpp:23
Provides seqan3::interleaved_bloom_filter.
Provides seqan3::views::kmer_hash.
The SeqAn namespace for literals.
A strong type that represents the number of bins for the seqan3::interleaved_bloom_filter.
Definition interleaved_bloom_filter.hpp:33
A strong type that represents the bin index for the seqan3::interleaved_bloom_filter.
Definition interleaved_bloom_filter.hpp:54
A strong type that represents the number of bits for each bin in the seqan3::interleaved_bloom_filter...
Definition interleaved_bloom_filter.hpp:40
A strong type of underlying type uint8_t that represents the ungapped shape size.
Definition shape.hpp:22

Enumeration Type Documentation

◆ error_type

enum class seqan3::detail::error_type : uint8_t
strong

An enumerator for the different error types used during the backtracking.

Enumerator
deletion 

A deletion was enumerated in previous backtracking step.

insertion 

A insertion was enumerated in previous backtracking step.

matchmm 

A match or a mismatch was enumerated.

none 

No error or match was enumerated yet.

Function Documentation

◆ compute_ss()

std::vector< search_dyn > seqan3::detail::compute_ss ( uint8_t const  min_error,
uint8_t const  max_error 
)
inline

Computes a (non-optimal) search scheme. Currently the generated search scheme represents trivial backtracking.

Parameters
[in]min_errorMinimum number of errors allowed.
[in]max_errorMaximum number of errors allowed.

Complexity

Constant.

Exceptions

Strong exception guarantee.

◆ search()

template<typename index_t , std::ranges::forward_range queries_t, typename configuration_t = decltype(search_cfg::default_configuration)>
requires std::ranges::forward_range<std::ranges::range_reference_t<queries_t>> && std::same_as<range_innermost_value_t<queries_t>, typename index_t::alphabet_type>
auto seqan3::search ( queries_t &&  queries,
index_t const &  index,
configuration_t const &  cfg = search_cfg::default_configuration 
)
inline

Search a query or a range of queries in an index.

Template Parameters
index_tType of the index. See Available Indices for the seqan3::search for an overview of our indices.
queries_tMust model std::ranges::random_access_range over the index's alphabet and std::ranges::sized_range. A range of queries must additionally model std::ranges::forward_range and std::ranges::sized_range.
Parameters
[in]queriesA single query or a range of queries.
[in]indexString index to be searched.
[in]cfgA configuration object specifying the search parameters (e.g. number of errors, error types, output format, etc.).
Returns
A seqan3::algorithm_result_generator_range with value type of seqan3::search_result.
Note
Always returns void if an on_hit delegate has been specified.

Header File

#include <seqan3/search/search.hpp>

The search algorithm strongly depends on the index that is used. Please see Available Indices for the seqan3::search for an overview of our indices.

For more details on how to configure the search, please see the respective documentation: Configuration.

Complexity

Each query with \(e\) errors takes \(O(|query|^e)\) where \(e\) is the maximum number of errors.

Exceptions

Strong exception guarantee if iterating the query does not change its state and if invoking a possible delegate specified in cfg also has a strong exception guarantee; basic exception guarantee otherwise.

Example

// SPDX-FileCopyrightText: 2006-2024 Knut Reinert & Freie Universität Berlin
// SPDX-FileCopyrightText: 2016-2024 Knut Reinert & MPI für molekulare Genetik
// SPDX-License-Identifier: CC0-1.0
#include <vector>
int main()
{
using namespace seqan3::literals;
std::vector<seqan3::dna4_vector> genomes{"CGCTGTCTGAAGGATGAGTGTCAGCCAGTGTA"_dna4,
"ACCCGATGAGCTACCCAGTAGTCGAACTG"_dna4,
"GGCCAGACAACCCGGCGCTAATGCACTCA"_dna4};
std::vector<seqan3::dna4_vector> queries{"GCT"_dna4, "ACCC"_dna4};
// build an FM index
seqan3::fm_index index{genomes};
auto results = seqan3::search(queries, index);
// search for the queries "GCT" and "ACCC"
for (auto && result : results)
seqan3::debug_stream << result << '\n';
// This should result in:
// <query_id:0, reference_id:0, reference_pos:1>
// <query_id:0, reference_id:1, reference_pos:9>
// <query_id:0, reference_id:2, reference_pos:16>
// <query_id:1, reference_id:1, reference_pos:0>
// <query_id:1, reference_id:1, reference_pos:12>
// <query_id:1, reference_id:2, reference_pos:9>
}
The SeqAn FM Index.
Definition fm_index.hpp:186
Meta-header for the Search / FM Index submodule .
Provides the public interface for search algorithms.

◆ search_algo_bi()

template<typename configuration_t , typename index_t , typename... policies_t>
requires (template_specialisation_of<typename index_t::cursor_type, bi_fm_index_cursor>)
template<bool abort_on_hit, typename query_t , typename delegate_t >
requires (template_specialisation_of<typename index_t::cursor_type, bi_fm_index_cursor>)
void seqan3::detail::search_scheme_algorithm< configuration_t, index_t, policies_t >::search_algo_bi ( query_t &  query,
search_param const  error_left,
delegate_t &&  delegate 
)
inlineprivate

Searches a query sequence in a bidirectional index.

Template Parameters
abort_on_hitIf the flag is set, the search aborts on the first hit.
query_tMust model std::ranges::random_access_range over the index's alphabet.
delegate_tTakes typename index_t::cursor_type as argument.
Parameters
[in]queryQuery sequence to be searched in the index.
[in]error_leftNumber of errors left for matching the remaining suffix of the query sequence.
[in]delegateFunction that is called on every hit.

Complexity

\(O(|query|^e)\) where \(e\) is the total number of maximum errors.

Exceptions

Strong exception guarantee if iterating the query does not change its state and if invoking the delegate also has a strong exception guarantee; basic exception guarantee otherwise.

◆ search_scheme_block_info()

template<typename search_scheme_t >
auto seqan3::detail::search_scheme_block_info ( search_scheme_t const &  search_scheme,
size_t const  query_length 
)
inline

Returns for each search the cumulative length of blocks in the order of blocks in each search and the starting position of the first block in the query sequence.

Template Parameters
search_scheme_tIs of type seqan3::detail::search_scheme_type or seqan3::detail::search_scheme_dyn_type.
Parameters
[in]search_schemeSearch scheme that will be used for searching.
[in]query_lengthLength of the query that will be searched in an index.
Returns
A range of pairs containing for each search the cumulative lengths of blocks and the starting position in the query.

Complexity

Constant.

Exceptions

Strong exception guarantee.

◆ search_ss() [1/2]

template<bool abort_on_hit, typename cursor_t , typename query_t , typename search_t , typename blocks_length_t , typename delegate_t >
bool seqan3::detail::search_ss ( cursor_t  cur,
query_t &  query,
typename cursor_t::size_type const  lb,
typename cursor_t::size_type const  rb,
uint8_t const  errors_spent,
uint8_t const  block_id,
bool const  go_right,
search_t const &  search,
blocks_length_t const &  blocks_length,
search_param const  error_left,
delegate_t &&  delegate 
)
inline

Searches a query sequence in a bidirectional index using a single search of a search schemes.

Template Parameters
abort_on_hitIf the flag is set, the search aborts on the first hit.
cursor_tMust model seqan3::detail::template_specialisation_of a seqan3::bi_fm_index_cursor.
query_tMust model std::ranges::random_access_range over the index's alphabet.
search_tIs of type seqan3::detail::search<> or seqan3::detail::search_dyn<>.
blocks_length_tIs of type std::array or std::vector of unsigned integers.
delegate_tTakes cursor_t as argument.
Parameters
[in]curCursor of a string index built on the text that will be searched.
[in]queryQuery sequence to be searched.
[in]lbLeft bound of the infix of query already searched (exclusive).
[in]rbRight bound of the infix of query already searched (exclusive).
[in]errors_spentNumber of errors spent while searching the infix of query.
[in]block_idId of the block that the infix is extended to next.
[in]go_rightThe infix will be extended to the right if the flag is set to true.
[in]searchSearch of a search scheme to be used for searching.
[in]blocks_lengthCumulative block lengths of the search.
[in]error_leftNumber of errors left for matching the remaining suffix of the query sequence.
[in]delegateFunction that is called on every hit.
Returns
True if and only if abort_on_hit is true and a hit has been found.

Complexity

\(O(|query|^e)\) where \(e\) is the total number of errors allowed by search.

Exceptions

Strong exception guarantee if iterating the query does not change its state and if invoking the delegate also has a strong exception guarantee; basic exception guarantee otherwise.

◆ search_ss() [2/2]

template<bool abort_on_hit, typename index_t , typename query_t , typename search_scheme_t , typename delegate_t >
void seqan3::detail::search_ss ( index_t const &  index,
query_t &  query,
search_param const  error_left,
search_scheme_t const &  search_scheme,
delegate_t &&  delegate 
)
inline

Searches a query sequence in a bidirectional index using search schemes.

Template Parameters
abort_on_hitIf the flag is set, the search aborts on the first hit.
index_tindex_t::cursor_type must model seqan3::detail::template_specialisation_of a seqan3::bi_fm_index_cursor.
query_tMust model std::ranges::random_access_range over the index's alphabet.
search_scheme_tIs of type seqan3::detail::search_scheme_type or seqan3::detail::search_scheme_dyn_type.
delegate_tTakes typename index_t::cursor_type as argument.
Parameters
[in]indexString index built on the text that will be searched.
[in]queryQuery sequence to be searched in the index.
[in]error_leftNumber of errors left for matching the remaining suffix of the query sequence.
[in]search_schemeSearch scheme to be used for searching.
[in]delegateFunction that is called on every hit.

Complexity

\(O(|query|^e)\) where \(e\) is the total number of maximum errors.

Exceptions

Strong exception guarantee if iterating the query does not change its state and if invoking the delegate also has a strong exception guarantee; basic exception guarantee otherwise.

◆ search_ss_children()

template<bool abort_on_hit, typename cursor_t , typename query_t , typename search_t , typename blocks_length_t , typename delegate_t >
bool seqan3::detail::search_ss_children ( cursor_t  cur,
query_t &  query,
typename cursor_t::size_type const  lb,
typename cursor_t::size_type const  rb,
uint8_t const  errors_spent,
uint8_t const  block_id,
bool const  go_right,
uint8_t const  min_error_left_in_block,
search_t const &  search,
blocks_length_t const &  blocks_length,
search_param const  error_left,
delegate_t &&  delegate 
)
inline

Searches a query sequence in a bidirectional index using a single search of a search schemes. Sub-function for approximate search step (iterating over all children in a conceptual suffix tree).

Template Parameters
abort_on_hitIf the flag is set, the search aborts on the first hit.
cursor_tMust model seqan3::detail::template_specialisation_of a seqan3::bi_fm_index_cursor.
query_tMust model std::ranges::random_access_range over the index's alphabet.
search_tIs of type seqan3::detail::search<> or seqan3::detail::search_dyn<>.
blocks_length_tIs of type std::array or std::vector of unsigned integers.
delegate_tTakes cursor_t as argument.
Parameters
[in]curCursor of a string index built on the text that will be searched.
[in]queryQuery sequence to be searched.
[in]lbLeft bound of the infix of query already searched (exclusive).
[in]rbRight bound of the infix of query already searched (exclusive).
[in]errors_spentNumber of errors spent while searching the infix of query.
[in]block_idId of the block that the infix is extended to next.
[in]go_rightThe infix will be extended to the right if the flag is set to true.
[in]searchSearch of a search scheme to be used for searching.
[in]blocks_lengthCumulative block lengths of the search.
[in]error_leftNumber of errors left for matching the remaining suffix of the query sequence.
[in]delegateFunction that is called on every hit.
Returns
True if and only if abort_on_hit is true and a hit has been found.

Complexity

\(O(|query|^e)\) where \(e\) is the total number of errors allowed by search.

Exceptions

Strong exception guarantee if iterating the query does not change its state and if invoking the delegate also has a strong exception guarantee; basic exception guarantee otherwise.

Parameters
[in]min_error_left_in_blockNumber of remaining errors that need to be spent in the current block.

◆ search_ss_deletion()

template<bool abort_on_hit, typename cursor_t , typename query_t , typename search_t , typename blocks_length_t , typename delegate_t >
bool seqan3::detail::search_ss_deletion ( cursor_t  cur,
query_t &  query,
typename cursor_t::size_type const  lb,
typename cursor_t::size_type const  rb,
uint8_t const  errors_spent,
uint8_t const  block_id,
bool const  go_right,
search_t const &  search,
blocks_length_t const &  blocks_length,
search_param const  error_left,
delegate_t &&  delegate 
)
inline

Searches a query sequence in a bidirectional index using a single search of a search schemes. Sub-function for deletions at the end of a block.

Template Parameters
abort_on_hitIf the flag is set, the search aborts on the first hit.
cursor_tMust model seqan3::detail::template_specialisation_of a seqan3::bi_fm_index_cursor.
query_tMust model std::ranges::random_access_range over the index's alphabet.
search_tIs of type seqan3::detail::search<> or seqan3::detail::search_dyn<>.
blocks_length_tIs of type std::array or std::vector of unsigned integers.
delegate_tTakes cursor_t as argument.
Parameters
[in]curCursor of a string index built on the text that will be searched.
[in]queryQuery sequence to be searched.
[in]lbLeft bound of the infix of query already searched (exclusive).
[in]rbRight bound of the infix of query already searched (exclusive).
[in]errors_spentNumber of errors spent while searching the infix of query.
[in]block_idId of the block that the infix is extended to next.
[in]go_rightThe infix will be extended to the right if the flag is set to true.
[in]searchSearch of a search scheme to be used for searching.
[in]blocks_lengthCumulative block lengths of the search.
[in]error_leftNumber of errors left for matching the remaining suffix of the query sequence.
[in]delegateFunction that is called on every hit.
Returns
True if and only if abort_on_hit is true and a hit has been found.

Complexity

\(O(|query|^e)\) where \(e\) is the total number of errors allowed by search.

Exceptions

Strong exception guarantee if iterating the query does not change its state and if invoking the delegate also has a strong exception guarantee; basic exception guarantee otherwise.

◆ search_ss_exact()

template<bool abort_on_hit, typename cursor_t , typename query_t , typename search_t , typename blocks_length_t , typename delegate_t >
bool seqan3::detail::search_ss_exact ( cursor_t  cur,
query_t &  query,
typename cursor_t::size_type const  lb,
typename cursor_t::size_type const  rb,
uint8_t const  errors_spent,
uint8_t const  block_id,
bool const  go_right,
search_t const &  search,
blocks_length_t const &  blocks_length,
search_param const  error_left,
delegate_t &&  delegate 
)
inline

Searches a query sequence in a bidirectional index using a single search of a search scheme. Sub-function for searching the remaining part of the current block without any errors.

Template Parameters
abort_on_hitIf the flag is set, the search aborts on the first hit.
cursor_tMust model seqan3::detail::template_specialisation_of a seqan3::bi_fm_index_cursor.
query_tMust model std::ranges::random_access_range over the index's alphabet.
search_tIs of type seqan3::detail::search<> or seqan3::detail::search_dyn<>.
blocks_length_tIs of type std::array or std::vector of unsigned integers.
delegate_tTakes cursor_t as argument.
Parameters
[in]curCursor of a string index built on the text that will be searched.
[in]queryQuery sequence to be searched.
[in]lbLeft bound of the infix of query already searched (exclusive).
[in]rbRight bound of the infix of query already searched (exclusive).
[in]errors_spentNumber of errors spent while searching the infix of query.
[in]block_idId of the block that the infix is extended to next.
[in]go_rightThe infix will be extended to the right if the flag is set to true.
[in]searchSearch of a search scheme to be used for searching.
[in]blocks_lengthCumulative block lengths of the search.
[in]error_leftNumber of errors left for matching the remaining suffix of the query sequence.
[in]delegateFunction that is called on every hit.
Returns
True if and only if abort_on_hit is true and a hit has been found.

Complexity

\(O(|query|^e)\) where \(e\) is the total number of errors allowed by search.

Exceptions

Strong exception guarantee if iterating the query does not change its state and if invoking the delegate also has a strong exception guarantee; basic exception guarantee otherwise.

◆ search_trivial()

template<typename configuration_t , typename index_t , typename... policies_t>
template<bool abort_on_hit, typename query_t >
bool seqan3::detail::unidirectional_search_algorithm< configuration_t, index_t, policies_t >::search_trivial ( typename index_t::cursor_type  cur,
query_t &  query,
typename index_t::cursor_type::size_type const  query_pos,
search_param const  error_left,
error_type const  prev_error 
)
inlineprivate

Searches a query sequence in an index using trivial backtracking.

Template Parameters
abort_on_hitIf the flag is set, the search algorithm aborts on the first hit.
query_tMust model std::ranges::input_range over the index's alphabet.
Parameters
[in]curCursor of a string index built on the text that will be searched.
[in]queryQuery sequence to be searched with the cursor.
[in]query_posPosition in the query sequence indicating the prefix that has already been searched.
[in]error_leftNumber of errors left for matching the remaining suffix of the query sequence.
[in]prev_errorPrevious scenario of search, i.e. error or match.
Returns
True if and only if abort_on_hit is true and a hit has been found.

Complexity

\(O(|query|^e)\) where \(e\) is the maximum number of errors.

Exceptions

No-throw guarantee if invoking the delegate also guarantees no-throw.

Variable Documentation

◆ optimum_search_scheme

template<uint8_t min_error, uint8_t max_error>
constexpr int seqan3::detail::optimum_search_scheme {0}
inlineconstexpr

Search scheme that is optimal in the running time for the specified lower and upper error bound.

Template Parameters
min_errorLower bound of errors.
max_errorUpper bound of errors.

Please note that the searches within each search scheme are sorted by their asymptotical run time (i.e. upper error bound string), s.t. easy to compute searches come first. This improves the run time of algorithms that abort after the first hit (e.g. search mode: best). Even though it is not guaranteed, this seems to be a good greedy approach.

Hide me