SeqAn3 3.4.0-rc.1
The Modern C++ library for sequence analysis.
Loading...
Searching...
No Matches
seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ > Class Template Reference

The SeqAn Bidirectional FM Index. More...

#include <seqan3/search/fm_index/bi_fm_index.hpp>

+ Inheritance diagram for seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >:

Public Types

Text types
using alphabet_type = typename fm_index_type::alphabet_type
 The type of the underlying character of the indexed text.
 
using size_type = typename sdsl_index_type::size_type
 Type for representing positions in the indexed text.
 
Cursor types
using cursor_type = bi_fm_index_cursor< bi_fm_index >
 The type of the bidirectional cursor.
 
using fwd_cursor_type = fm_index_cursor< fm_index_type >
 The type of the unidirectional cursor on the original text.
 

Public Member Functions

cursor_type cursor () const noexcept
 Returns a seqan3::bi_fm_index_cursor on the index that can be used for searching. Cursor is pointing to the root node of the implicit affix tree. .
 
bool empty () const noexcept
 Checks whether the index is empty.
 
fwd_cursor_type fwd_cursor () const noexcept
 Returns a unidirectional seqan3::fm_index_cursor on the original text of the bidirectional index that can be used for searching.
 
bool operator!= (bi_fm_index const &rhs) const noexcept
 Compares two indices.
 
bool operator== (bi_fm_index const &rhs) const noexcept
 Compares two indices.
 
template<cereal_archive archive_t>
void serialize (archive_t &archive)
 Serialisation support function.
 
size_type size () const noexcept
 Returns the length of the indexed text including sentinel characters.
 
Constructors, destructor and assignment
 bi_fm_index ()=default
 Defaulted.
 
 bi_fm_index (bi_fm_index const &)=default
 Defaulted.
 
bi_fm_indexoperator= (bi_fm_index const &)=default
 Defaulted.
 
 bi_fm_index (bi_fm_index &&)=default
 Defaulted.
 
bi_fm_indexoperator= (bi_fm_index &&)=default
 Defaulted.
 
 ~bi_fm_index ()=default
 Defaulted.
 
template<std::ranges::range text_t>
 bi_fm_index (text_t &&text)
 Constructor that immediately constructs the index given a range. The range cannot be empty.
 

Static Public Attributes

static constexpr text_layout text_layout_mode = text_layout_mode_
 Indicates whether index is built over a collection.
 

Private Types

Index types
using sdsl_index_type = sdsl_index_type_
 The type of the underlying SDSL index for the original text.
 
using rev_sdsl_index_type = sdsl::csa_wt< sdsl_wt_index_type::wavelet_tree_type, 10 '000 '000, 10 '000 '000, sdsl::sa_order_sa_sampling<>, sdsl::isa_sampling<>, sdsl_wt_index_type::alphabet_type >
 The type of the underlying SDSL index for the reversed text.
 
using sdsl_char_type = typename sdsl_index_type::alphabet_type::char_type
 The type of the reduced alphabet type. (The reduced alphabet might be smaller than the original alphabet in case not all possible characters occur in the indexed text.)
 
using sdsl_sigma_type = typename sdsl_index_type::alphabet_type::sigma_type
 The type of the alphabet size of the underlying SDSL index.
 
using fm_index_type = fm_index< alphabet_t, text_layout_mode_, sdsl_index_type >
 The type of the underlying FM index for the original text.
 
using rev_fm_index_type = detail::reverse_fm_index< alphabet_t, text_layout_mode_, rev_sdsl_index_type >
 The type of the underlying FM index for the reversed text.
 

Private Member Functions

template<std::ranges::range text_t>
void construct (text_t &&text)
 Constructs the index given a range. The range cannot be an rvalue (i.e. a temporary object) and has to be non-empty.
 

Private Attributes

fm_index_type fwd_fm
 Underlying FM index for the original text.
 
rev_fm_index_type rev_fm
 Underlying FM index for the reversed text.
 

Friends

template<typename bi_fm_index_t >
class bi_fm_index_cursor
 

Detailed Description

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
class seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >

The SeqAn Bidirectional FM Index.

Template Parameters
alphabet_tThe alphabet type; must model seqan3::semialphabet.
text_layout_mode_Indicates whether this index works on a text collection or a single text. See seqan3::text_layout.
sdsl_index_type_The type of the underlying SDSL index, must model seqan3::detail::sdsl_index.

The seqan3::bi_fm_index is a fast and space-efficient bidirectional string index to search strings and collections of strings. In general, we recommend to favour the seqan3::bi_fm_index over the unidirectional seqan3::fm_index if you want to allow multiple errors when searching.

General information

Here is a short example on how to build an index and search a pattern using an cursor. Please note that there is a very powerful search module with a high-level interface seqan3::search that encapsulates the use of cursors.

// SPDX-FileCopyrightText: 2006-2024 Knut Reinert & Freie Universität Berlin
// SPDX-FileCopyrightText: 2016-2024 Knut Reinert & MPI für molekulare Genetik
// SPDX-License-Identifier: CC0-1.0
#include <vector>
int main()
{
using namespace seqan3::literals;
std::vector<seqan3::dna4> genome{"ATCGATCGAAGGCTAGCTAGCTAAGGGA"_dna4};
seqan3::bi_fm_index index{genome}; // build the index
auto cur = index.cursor(); // create a cursor
cur.extend_right("GG"_dna4); // search the pattern "GG"
cur.extend_left("AA"_dna4); // search the pattern "AAGG"
seqan3::debug_stream << "Number of hits: " << cur.count() << '\n'; // outputs: 2
seqan3::debug_stream << "Positions in the genome: ";
for (auto && pos : cur.locate()) // outputs: (0, 8), (0, 22)
seqan3::debug_stream << pos << ' ';
return 0;
}
bool extend_right() noexcept
Tries to extend the query by the smallest possible character to the right such that the query is foun...
Definition bi_fm_index_cursor.hpp:328
The SeqAn Bidirectional FM Index.
Definition bi_fm_index.hpp:58
cursor_type cursor() const noexcept
Returns a seqan3::bi_fm_index_cursor on the index that can be used for searching. Cursor is pointing...
Definition bi_fm_index.hpp:252
Provides seqan3::debug_stream and related types.
Provides seqan3::dna4, container aliases and string literals.
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition debug_stream.hpp:37
The SeqAn namespace for literals.
The main SeqAn3 namespace.
Definition aligned_sequence_concept.hpp:26
Meta-header for the Search / FM Index submodule .
Attention
When building an index for a single text over any alphabet, the symbol with rank 255 is reserved and may not occur in the text.

Here is an example using a collection of strings (e.g. a genome with multiple chromosomes or a protein database):

// SPDX-FileCopyrightText: 2006-2024 Knut Reinert & Freie Universität Berlin
// SPDX-FileCopyrightText: 2016-2024 Knut Reinert & MPI für molekulare Genetik
// SPDX-License-Identifier: CC0-1.0
#include <vector>
int main()
{
using namespace seqan3::literals;
std::vector<std::vector<seqan3::dna4>> genomes{"ATCTGACGAAGGCTAGCTAGCTAAGGGA"_dna4,
"TAGCTGAAGCCATTGGCATCTGATCGGACT"_dna4,
"ACTGAGCTCGTC"_dna4,
"TGCATGCACCCATCGACTGACTG"_dna4,
"GTACGTACGTTACG"_dna4};
seqan3::bi_fm_index index{genomes}; // build the index
auto cur = index.cursor(); // create a cursor
cur.extend_right("GA"_dna4); // search the pattern "GA"
cur.extend_left("CT"_dna4); // search the pattern "CTGA"
seqan3::debug_stream << "Number of hits: " << cur.count() << '\n'; // outputs: 5
seqan3::debug_stream << "Positions in the genomes: ";
for (auto && pos : cur.locate()) // outputs: (3,16) (2,1) (1,3) (0,2) (1,19)
seqan3::debug_stream << pos << ' ';
return 0;
}
Attention
When building an index for a text collection over any alphabet, the symbols with rank 254 and 255 are reserved and may not be used in the text.

Constructor & Destructor Documentation

◆ bi_fm_index()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
template<std::ranges::range text_t>
seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::bi_fm_index ( text_t &&  text)
inline

Constructor that immediately constructs the index given a range. The range cannot be empty.

Template Parameters
text_tThe type of range to construct from; must model std::ranges::bidirectional_range.
Parameters
[in]textThe text to construct from.

Complexity

Todo:
At least linear.

Member Function Documentation

◆ construct()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
template<std::ranges::range text_t>
void seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::construct ( text_t &&  text)
inlineprivate

Constructs the index given a range. The range cannot be an rvalue (i.e. a temporary object) and has to be non-empty.

Template Parameters
text_tThe type of range to construct from; must model std::ranges::bidirectional_range.
Parameters
[in]textThe text to construct from.
Todo:
This has to be better implemented with regard to the memory peak due to not matching interfaces with the SDSL.

Complexity

Todo:
At least linear.

Exceptions

No guarantee.

Todo:
Ensure strong exception guarantee.

◆ cursor()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
cursor_type seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::cursor ( ) const
inlinenoexcept

Returns a seqan3::bi_fm_index_cursor on the index that can be used for searching. Cursor is pointing to the root node of the implicit affix tree. .

Returns
Returns a bidirectional seqan3::bi_fm_index_cursor on the index.

Complexity

Constant.

Exceptions

No-throw guarantee.

◆ empty()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
bool seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::empty ( ) const
inlinenoexcept

Checks whether the index is empty.

Returns
true if the index is empty, false otherwise.

Complexity

Constant.

Exceptions

No-throw guarantee.

◆ fwd_cursor()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
fwd_cursor_type seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::fwd_cursor ( ) const
inlinenoexcept

Returns a unidirectional seqan3::fm_index_cursor on the original text of the bidirectional index that can be used for searching.

Returns
Returns a unidirectional seqan3::fm_index_cursor on the index of the original text.

Complexity

Constant.

Exceptions

No-throw guarantee.

◆ operator!=()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
bool seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::operator!= ( bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ > const &  rhs) const
inlinenoexcept

Compares two indices.

Returns
true if the indices are unequal, false otherwise.

Complexity

Linear.

Exceptions

No-throw guarantee.

◆ operator==()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
bool seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::operator== ( bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ > const &  rhs) const
inlinenoexcept

Compares two indices.

Returns
true if the indices are equal, false otherwise.

Complexity

Linear.

Exceptions

No-throw guarantee.

◆ serialize()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
template<cereal_archive archive_t>
void seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::serialize ( archive_t &  archive)
inline

Serialisation support function.

Template Parameters
archive_tType of archive; must satisfy seqan3::cereal_archive.
Parameters
archiveThe archive being serialised from/to.
Attention
These functions are never called directly, see Serialisation for more details.

◆ size()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
size_type seqan3::bi_fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::size ( ) const
inlinenoexcept

Returns the length of the indexed text including sentinel characters.

Returns
Returns the length of the indexed text including sentinel characters.

Complexity

Constant.

Exceptions

No-throw guarantee.


The documentation for this class was generated from the following file:
Hide me