SeqAn3 3.3.0-rc.1
The Modern C++ library for sequence analysis.
Search

Data structures and approximate string search algorithms for large collection of text (e.g. DNA). More...

+ Collaboration diagram for Search:

Modules

 Configuration
 Data structures and utility functions for configuring search algorithm.
 
 DREAM Index
 Provides seqan3::interleaved_bloom_filter.
 
 FM Index
 Provides seqan3::fm_index and seqan3::bi_fm_index as well as respective cursors.
 
 k-mer Index
 Implementation of shapes for a k-mer Index.
 
 Views
 Search related views.
 

Classes

class  seqan3::search_result< query_id_type, cursor_type, reference_id_type, reference_begin_position_type >
 The result class generated by the seqan3::seach algorithm. More...
 

Functions

template<typename index_t , std::ranges::forward_range queries_t, typename configuration_t = decltype(search_cfg::default_configuration)>
requires std::ranges::forward_range<std::ranges::range_reference_t<queries_t>> && std::same_as<range_innermost_value_t<queries_t>, typename index_t::alphabet_type>
auto seqan3::search (queries_t &&queries, index_t const &index, configuration_t const &cfg=search_cfg::default_configuration)
 Search a query or a range of queries in an index. More...
 

Detailed Description

Data structures and approximate string search algorithms for large collection of text (e.g. DNA).

Meta-header for the Search module .

Searching is a key component in many sequence analysis tools. The search module is a powerful and easy way to search sequences in a large text or an arbitrary nested collection of texts. When it comes to searching, indices are a core component for searching large amounts of data and are used for tools such as read mappers, assemblers or protein search tools.

SeqAn currently implements only the FM index and a k-mer index is planned. The FM index works with arbitrary pattern lengths and error numbers.

Search algorithm

The Search module offers a unified search interface seqan3::search. The function chooses the best search method based on the provided index and an optional configuration.

auto results = seqan3::search(queries, index);
auto search(queries_t &&queries, index_t const &index, configuration_t const &cfg=search_cfg::default_configuration)
Search a query or a range of queries in an index.
Definition: search.hpp:103

Available Indices

(bidirectional) FM index

Two FM indices are available: seqan3::fm_index and seqan3::bi_fm_index. The search algorithms for these FM indices implement either a trivial backtracking approach or an optimum search scheme. The latter are currently only available for searches with up to three errors using bidirectional indices. In the future we plan to improve the optimum search schemes to handle higher error counts.

Reference:

Kianfar, K., Pockrandt, C., Torkamandi, B., Luo, H., & Reinert, K. (2018).

Optimum Search Schemes for Approximate String Matching Using Bidirectional FM-Index. bioRxiv, 301085. https://doi.org/10.1101/301085

Configuration

The approximate string search algorithm can be configured in multiple ways. See Configuration for details.

Function Documentation

◆ search()

template<typename index_t , std::ranges::forward_range queries_t, typename configuration_t = decltype(search_cfg::default_configuration)>
requires std::ranges::forward_range<std::ranges::range_reference_t<queries_t>> && std::same_as<range_innermost_value_t<queries_t>, typename index_t::alphabet_type>
auto seqan3::search ( queries_t &&  queries,
index_t const &  index,
configuration_t const &  cfg = search_cfg::default_configuration 
)
inline

Search a query or a range of queries in an index.

Template Parameters
index_tType of the index.
queries_tMust model std::ranges::random_access_range over the index's alphabet and std::ranges::sized_range. A range of queries must additionally model std::ranges::forward_range and std::ranges::sized_range.
Parameters
[in]queriesA single query or a range of queries.
[in]indexString index to be searched.
[in]cfgA configuration object specifying the search parameters (e.g. number of errors, error types, output format, etc.).
Returns
A seqan3::algorithm_result_generator_range with value type of seqan3::search_result.

Header File

#include <seqan3/search/search.hpp>

Complexity

Each query with $e$ errors takes $O(|query|^e)$ where $e$ is the maximum number of errors.

Exceptions

Strong exception guarantee if iterating the query does not change its state and if invoking a possible delegate specified in cfg also has a strong exception guarantee; basic exception guarantee otherwise.

Example

#include <vector>
int main()
{
using namespace seqan3::literals;
std::vector<seqan3::dna4_vector> genomes{"CGCTGTCTGAAGGATGAGTGTCAGCCAGTGTA"_dna4,
"ACCCGATGAGCTACCCAGTAGTCGAACTG"_dna4,
"GGCCAGACAACCCGGCGCTAATGCACTCA"_dna4};
std::vector<seqan3::dna4_vector> queries{"GCT"_dna4, "ACCC"_dna4};
// build an FM index
seqan3::fm_index index{genomes};
auto results = seqan3::search(queries, index);
// search for the queries "GCT" and "ACCC"
for (auto && result : results)
seqan3::debug_stream << result << '\n';
// This should result in:
// <query_id:0, reference_id:0, reference_pos:1>
// <query_id:0, reference_id:1, reference_pos:9>
// <query_id:0, reference_id:2, reference_pos:16>
// <query_id:1, reference_id:1, reference_pos:0>
// <query_id:1, reference_id:1, reference_pos:12>
// <query_id:1, reference_id:2, reference_pos:9>
}
The SeqAn FM Index.
Definition: fm_index.hpp:189
Provides seqan3::debug_stream and related types.
Provides seqan3::dna4, container aliases and string literals.
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition: debug_stream.hpp:37
The SeqAn namespace for literals.
Meta-header for the Search / FM Index submodule .
Provides the public interface for search algorithms.