SeqAn3  3.0.1
The Modern C++ library for sequence analysis.
seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ > Class Template Reference

The SeqAn FM Index. More...

#include <seqan3/search/fm_index/fm_index.hpp>

+ Inheritance diagram for seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >:

Public Member Functions

cursor_type begin () const noexcept
 Returns a seqan3::fm_index_cursor on the index that can be used for searching. More...
 
bool empty () const noexcept
 Checks whether the index is empty. More...
 
bool operator!= (fm_index const &rhs) const noexcept
 Compares two indices. More...
 
bool operator== (fm_index const &rhs) const noexcept
 Compares two indices. More...
 
size_type size () const noexcept
 Returns the length of the indexed text including sentinel characters. More...
 
Constructors, destructor and assignment
 fm_index ()=default
 Defaulted.
 
 fm_index (fm_index const &rhs)
 When copy constructing, also update internal data structures.
 
 fm_index (fm_index &&rhs)
 When move constructing, also update internal data structures.
 
fm_indexoperator= (fm_index rhs)
 When copy/move assigning, also update internal data structures.
 
 ~fm_index ()=default
 Defaulted.
 
template<std::ranges::range text_t>
 fm_index (text_t &&text)
 Constructor that immediately constructs the index given a range. The range cannot be empty. More...
 

Static Public Attributes

static constexpr text_layout text_layout_mode = text_layout_mode_
 Indicates whether index is built over a collection.
 

Friends

template<typename bi_fm_index_t >
class bi_fm_index_cursor
 
template<typename fm_index_t >
class fm_index_cursor
 

Related Functions

(Note that these are not member functions.)

Requirements for seqan3::fm_index_specialisation

You can expect these member types and member functions on all types that satisfy seqan3::fm_index_specialisation.

typename t::text_type text_type
 Type of the indexed text.
 

Member types

using alphabet_type = alphabet_t
 The type of the underlying character of the indexed text.
 
using size_type = typename sdsl_index_type::size_type
 Type for representing positions in the indexed text.
 
using cursor_type = fm_index_cursor< fm_index >
 The type of the (unidirectional) cursor.
 

Detailed Description

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
class seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >

The SeqAn FM Index.

Template Parameters
alphabet_tThe alphabet type; must model seqan3::semialphabet.
text_layout_mode_Indicates whether this index works on a text collection or a single text. See seqan3::text_layout.
sdsl_index_type_The type of the underlying SDSL index, must model seqan3::sdsl_index.

The seqan3::fm_index is a fast and space-efficient string index to search strings and collections of strings.

General information

Here is a short example on how to build an index and search a pattern using an cursor. Please note that there is a very powerful search module with a high-level interface seqan3::search that encapsulates the use of cursors.

#include <vector>
int main()
{
using seqan3::operator""_dna4;
std::vector<seqan3::dna4> genome{"ATCGATCGAAGGCTAGCTAGCTAAGGGA"_dna4};
seqan3::fm_index index{genome}; // build the index
auto cur = index.begin(); // create a cursor
cur.extend_right("AAGG"_dna4); // search the pattern "AAGG"
seqan3::debug_stream << "Number of hits: " << cur.count() << '\n'; // outputs: 2
seqan3::debug_stream << "Positions in the genome: ";
for (auto const & pos : cur.locate()) // outputs: 8, 22
seqan3::debug_stream << pos << ' ';
return 0;
}
Attention
When building an index for a single text over any alphabet, the symbol with rank 255 is reserved and may not occur in the text.

Here is an example using a collection of strings (e.g. a genome with multiple chromosomes or a protein database):

#include <vector>
int main()
{
using seqan3::operator""_dna4;
std::vector<std::vector<seqan3::dna4>> genomes{"ATCTGACGAAGGCTAGCTAGCTAAGGGA"_dna4,
"TAGCTGAAGCCATTGGCATCTGATCGGACT"_dna4,
"ACTGAGCTCGTC"_dna4,
"TGCATGCACCCATCGACTGACTG"_dna4,
"GTACGTACGTTACG"_dna4};
seqan3::fm_index index{genomes}; // build the index
auto cur = index.begin(); // create a cursor
cur.extend_right("CTGA"_dna4); // search the pattern "CTGA"
seqan3::debug_stream << "Number of hits: " << cur.count() << '\n'; // outputs: 5
seqan3::debug_stream << "Positions in the genomes: ";
for (auto const & pos : cur.locate()) // outputs: (3,16) (2,1) (1,3) (0,2) (1,19)
seqan3::debug_stream << pos << ' ';
return 0;
}
Attention
When building an index for a text collection over any alphabet, the symbols with rank 254 and 255 are reserved and may not be used in the text.

Constructor & Destructor Documentation

◆ fm_index()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
template<std::ranges::range text_t>
seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::fm_index ( text_t &&  text)
inline

Constructor that immediately constructs the index given a range. The range cannot be empty.

Template Parameters
text_tThe type of range to construct from; must model std::ranges::bidirectional_range.
Parameters
[in]textThe text to construct from.

Complexity

At least linear.

Member Function Documentation

◆ begin()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
cursor_type seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::begin ( ) const
inlinenoexcept

Returns a seqan3::fm_index_cursor on the index that can be used for searching.

Returns
Returns a (unidirectional) seqan3::fm_index_cursor on the index.

Complexity

Constant.

Exceptions

No-throw guarantee.

◆ empty()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
bool seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::empty ( ) const
inlinenoexcept

Checks whether the index is empty.

Returns
true if the index is empty, false otherwise.

Complexity

Constant.

Exceptions

No-throw guarantee.

◆ operator!=()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
bool seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::operator!= ( fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ > const &  rhs) const
inlinenoexcept

Compares two indices.

Returns
true if the indices are unequal, false otherwise.

Complexity

Linear.

Exceptions

No-throw guarantee.

◆ operator==()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
bool seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::operator== ( fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ > const &  rhs) const
inlinenoexcept

Compares two indices.

Returns
true if the indices are equal, false otherwise.

Complexity

Linear.

Exceptions

No-throw guarantee.

◆ size()

template<semialphabet alphabet_t, text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
size_type seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >::size ( ) const
inlinenoexcept

Returns the length of the indexed text including sentinel characters.

Returns
Returns the length of the indexed text including sentinel characters.

Complexity

Constant.

Exceptions

No-throw guarantee.


The documentation for this class was generated from the following file:
debug_stream.hpp
Provides seqan3::debug_stream and related types.
dna4.hpp
Provides seqan3::dna4, container aliases and string literals.
all.hpp
Meta-header for the FM index module.
seqan3::fm_index
The SeqAn FM Index.
Definition: fm_index.hpp:134
vector
seqan3::debug_stream
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition: debug_stream.hpp:39