Data structures and algorithms for the search of query sequences in a large collection of text. More...
Modules | |
Configuration | |
Data structures and utility functions for configuring search algorithm. | |
DREAM Index | |
Provides seqan3:interleaved_bloom_filter. | |
FM Index | |
Provides seqan3:fm_index and seqan3:bi_fm_index as well as respective cursors. | |
k-mer Index | |
Implementation of shapes for a k-mer Index. | |
Classes | |
class | seqan3::search_result< query_id_type, cursor_type, reference_id_type, reference_begin_position_type > |
The result class generated by the seqan3::seach algorithm. More... | |
Functions | |
template<fm_index_specialisation index_t, std::ranges::forward_range queries_t, typename configuration_t = decltype(search_cfg::default_configuration)> | |
auto | seqan3::search (queries_t &&queries, index_t const &index, configuration_t const &cfg=search_cfg::default_configuration) |
Search a query or a range of queries in an index. More... | |
Data structures and algorithms for the search of query sequences in a large collection of text.
Searching is a key component in many sequence analysis tools. The search module is a powerful and easy way to search sequences in a large text or an arbitrary nested collection of texts. When it comes to searching, indices are a core component for searching large amounts of data and are used for tools such as read mappers, assemblers or protein search tools.
SeqAn currently implements only the FM index and a k-mer index is planned. The FM index works with arbitrary pattern lengths and error numbers.
The Search module offers a simple unified interface for searching a query in a large indexed text. The algorithm chooses the best search method based on the provided index.
The search algorithms for FM indices implement either a trivial backtracking approach or an optimum search scheme. The latter are currently only available for searches with up to three errors using bidirectional indices. In the future we plan to improve the optimum search schemes to handle higher error counts.
Reference:
Kianfar, K., Pockrandt, C., Torkamandi, B., Luo, H., & Reinert, K. (2018).
Optimum Search Schemes for Approximate String Matching Using Bidirectional FM-Index. bioRxiv, 301085. https://doi.org/10.1101/301085
A k-mer index can be used to efficiently retrieve all occurrences of a certain k-mer in the text. The k-mer can be either an exact string of length k or it can contain one or more wildcards, which denote positions of arbitrary characters.
An exact k-mer is represented as seqan3::ungapped, and wildcards can be defined with seqan3::shape. Please check the respective documentation for details and examples.
|
inline |
Search a query or a range of queries in an index.
index_t | Must model seqan3::fm_index_specialisation. |
queries_t | Must model std::ranges::random_access_range over the index's alphabet and std::ranges::sized_range. A range of queries must additionally model std::ranges::forward_range and std::ranges::sized_range. |
[in] | queries | A single query or a range of queries. |
[in] | index | String index to be searched. |
[in] | cfg | A configuration object specifying the search parameters (e.g. number of errors, error types, output format, etc.). |
Header File
#include <seqan3/search/search.hpp>
Each query with errors takes
where
is the maximum number of errors.
Strong exception guarantee if iterating the query does not change its state and if invoking a possible delegate specified in cfg
also has a strong exception guarantee; basic exception guarantee otherwise.