Provides seqan3:fm_index and seqan3:bi_fm_index as well as respective cursors. More...
Classes | |
class | seqan3::bi_fm_index< is_collection, sdsl_index_type_ > |
The SeqAn Bidirectional FM Index. More... | |
class | seqan3::bi_fm_index_cursor< index_t > |
The SeqAn Bidirectional FM Index Cursor. More... | |
interface | seqan3::BiFmIndex |
Concept for bidirectional FM indices. More... | |
interface | seqan3::BiFmIndexCursor |
Concept for bidirectional FM index cursors. More... | |
class | seqan3::fm_index< is_collection, sdsl_index_type_ > |
The SeqAn FM Index. More... | |
class | seqan3::fm_index_cursor< index_t > |
The SeqAn FM Index Cursor. More... | |
interface | seqan3::FmIndex |
Concept for unidirectional FM indices. More... | |
interface | seqan3::FmIndexCursor |
Concept for unidirectional FM index cursors. More... | |
Typedefs | |
using | seqan3::default_sdsl_index_type = sdsl_wt_index_type |
The default FM Index Configuration. More... | |
using | seqan3::sdsl_wt_index_type = sdsl::csa_wt< sdsl::wt_blcd< sdsl::bit_vector, sdsl::rank_support_v<>, sdsl::select_support_scan<>, sdsl::select_support_scan< 0 > >, 16, 10000000, sdsl::sa_order_sa_sampling<>, sdsl::isa_sampling<>, sdsl::plain_byte_alphabet > |
The FM Index Configuration using a Wavelet Tree. More... | |
Template argument type deduction guides | |
template<std::ranges::Range text_t> | |
seqan3::bi_fm_index (text_t &&) -> bi_fm_index< dimension_v< text_t > !=1 > | |
Deduces the dimensions of the text. | |
template<std::ranges::Range text_t> | |
seqan3::fm_index (text_t &&) -> fm_index< dimension_v< text_t > !=1 > | |
Deduces the dimensions of the text. | |
Requirements for seqan3::FmIndex | |
You can expect these member types and member functions on all types that satisfy seqan3::FmIndex. | |
typename t::text_type | text_type |
Type of the indexed text. | |
typename t::char_type | char_type |
Type of the underlying character of text_type. | |
typename t::size_type | size_type |
Type for representing the size of the indexed text. | |
typename t::cursor_type | cursor_type |
Type of the unidirectional FM index cursor. | |
Requirements for seqan3::FmIndexCursor | |
You can expect these member types and member functions on all types that satisfy seqan3::FmIndexCursor. | |
typename t::index_type | index_type |
Type of the underlying SeqAn FM index (not the underlying SDSL index). | |
typename t::size_type | size_type |
Type for representing the size of the indexed text. | |
Requirements for seqan3::BiFmIndex | |
You can expect these member types and member functions on all types that satisfy seqan3::BiFmIndex. | |
typename t::cursor_type | cursor_type |
Type of the bidirectional FM index cursor. | |
typename t::fwd_cursor_type | fwd_cursor_type |
Type of the unidirectional FM index cursor based on the unidirectional FM index on the original text. | |
typename t::rev_cursor_type | rev_cursor_type |
Type of the unidirectional FM index cursor based on the unidirectional FM index on the reversed text. | |
Requirements for seqan3::BiFmIndexCursor | |
You can expect these member types and member functions on all types that satisfy seqan3::FmIndexCursor. | |
typename t::index_type | index_type |
Type of the underlying SeqAn FM index (not the underlying SDSL index). | |
typename t::size_type | size_type |
Type for representing the size of the indexed text. | |
Provides seqan3:fm_index and seqan3:bi_fm_index as well as respective cursors.
FM indices are text indices similar to suffix trees or suffix arrays which are based on the Burrow Wheeler transform and a sampled suffix array. FM indices are significantly smaller in space without sacrificing speed. They also allow for adjusting the speed respectively space by choosing the underlying data structures accordingly or modyfing the sampling rate of the sampled suffix array.
The FM indices are based on the SDSL 3 (succinct data structure library). You are able to specify the underlying implementation of the SDSL to adjust it to your needs as well as choose one of the preconfigured indices that are suitable for common applications in sequence analysis.
For technical reasons you can currently only build indices over a seqan3::Alphabet if its seqan3::alphabet_size is smaller or equal 256.
You can choose between unidirectional and bidirectional FM indices (which can be thought of suffix trees and affix trees, i.e. a combination of suffix and prefix trees being able to search a pattern from left to right, right to left and character by character in any arbitrary order). Roughly speaking bidirectional FM indices are more powerful for approximate string matching at the cost of a higher space consumption (between a factor of 5 and 9 of the input size depending on the configuration).
Index Cursors are lightweight objects, i.e. they are cheap to copy.
Note that although the SeqAn3 index cursor, although having similar behaviour, don't model any of the standard library iterator concepts, not even std::Iterator.
using seqan3::default_sdsl_index_type = typedef sdsl_wt_index_type |
The default FM Index Configuration.
using seqan3::sdsl_wt_index_type = typedef sdsl::csa_wt<sdsl::wt_blcd<sdsl::bit_vector, sdsl::rank_support_v<>, sdsl::select_support_scan<>, sdsl::select_support_scan<0> >, 16, 10000000, sdsl::sa_order_sa_sampling<>, sdsl::isa_sampling<>, sdsl::plain_byte_alphabet> |
The FM Index Configuration using a Wavelet Tree.
: alphabet_size<char_type> where char_type is the seqan3 alphabet type (e.g. dna4 has an alphabet size of 4).
For an index over a text collection a delimiter is added inbetween the texts. This causes sigma to increase by 1.