SeqAn3  3.0.0
The Modern C++ library for sequence analysis.
seqan3::fm_index< is_collection, sdsl_index_type_ > Class Template Reference

The SeqAn FM Index. More...

#include <seqan3/search/fm_index/fm_index.hpp>

+ Inheritance diagram for seqan3::fm_index< is_collection, sdsl_index_type_ >:

Public Member Functions

cursor_type begin () const noexcept
 Returns a seqan3::fm_index_cursor on the index that can be used for searching. More...
 
template<std::ranges::Range text_t>
void construct (text_t &&text)
 Constructs the index given a range. The range cannot be an rvalue (i.e. a temporary object) and has to be non-empty. More...
 
template<std::ranges::Range text_t>
void construct (text_t &&text)
 
bool empty () const noexcept
 Checks whether the index is empty. More...
 
bool operator!= (fm_index const &rhs) const noexcept
 Compares two indices. More...
 
bool operator== (fm_index const &rhs) const noexcept
 Compares two indices. More...
 
size_type size () const noexcept
 Returns the length of the indexed text including sentinel characters. More...
 
Constructors, destructor and assignment
 fm_index ()=default
 Default constructor.
 
 fm_index (fm_index const &)=default
 Copy constructor.
 
fm_indexoperator= (fm_index const &)=default
 Copy assignment.
 
 fm_index (fm_index &&)=default
 Move constructor.
 
fm_indexoperator= (fm_index &&)=default
 Move assignment.
 
 ~fm_index ()=default
 Destructor.
 
template<std::ranges::Range text_t>
 fm_index (text_t &&text)
 Constructor that immediately constructs the index given a range. The range cannot be empty. More...
 

Protected Attributes

sdsl_index_type index
 Underlying index from the SDSL.
 
size_t sigma {0}
 The alphabet size of the text.
 
sdsl::sd_vector text_begin
 Bitvector storing begin positions for collections.
 
sdsl::rank_support_sd< 1 > text_begin_rs
 Rank support for text_begin.
 
sdsl::select_support_sd< 1 > text_begin_ss
 Select support for text_begin.
 

Static Protected Attributes

static constexpr bool is_collection_ {is_collection}
 Indicates whether index is built over a collection.
 

Friends

template<typename bi_fm_index_t >
class bi_fm_index_cursor
 
template<typename fm_index_t >
class fm_index_cursor
 

Related Functions

(Note that these are not member functions.)

Requirements for seqan3::FmIndex

You can expect these member types and member functions on all types that satisfy seqan3::FmIndex.

typename t::text_type text_type
 Type of the indexed text.
 
typename t::char_type char_type
 Type of the underlying character of text_type.
 

Member types

using sdsl_index_type = sdsl_index_type_
 The type of the underlying SDSL index.
 
using sdsl_char_type = typename sdsl_index_type::alphabet_type::char_type
 The type of the reduced alphabet type. (The reduced alphabet might be smaller than the original alphabet in case not all possible characters occur in the indexed text.)
 
using sdsl_sigma_type = typename sdsl_index_type::alphabet_type::sigma_type
 The type of the alphabet size of the underlying SDSL index.
 
using size_type = typename sdsl_index_type::size_type
 Type for representing positions in the indexed text.
 
using cursor_type = fm_index_cursor< fm_index< is_collection_, sdsl_index_type > >
 The type of the (unidirectional) cursor.
 

Detailed Description

template<bool is_collection = false, detail::SdslIndex sdsl_index_type_ = default_sdsl_index_type>
class seqan3::fm_index< is_collection, sdsl_index_type_ >

The SeqAn FM Index.

Template Parameters
is_collectionIndicates whether this index works on a text collection (true) or a single text (false).
sdsl_index_type_The type of the underlying SDSL index, must model seqan3::SdslIndex.

The seqan3::fm_index is a fast and space-efficient string index to search strings and collections of strings.

General information

Here is a short example on how to build an index and search a pattern using an cursor. Please note that there is a very powerful search module with a high-level interface seqan3::search that encapsulates the use of cursors.

#include <vector>
using namespace seqan3;
int main()
{
std::vector<dna4> genome{"ATCGATCGAAGGCTAGCTAGCTAAGGGA"_dna4};
fm_index index{genome}; // build the index
auto cur = index.begin(); // create a cursor
cur.extend_right("AAGG"_dna4); // search the pattern "AAGG"
debug_stream << "Number of hits: " << cur.count() << '\n'; // outputs: 2
debug_stream << "Positions in the genome: ";
for (auto const & pos : cur.locate()) // outputs: 8, 22
debug_stream << pos << ' ';
debug_stream << '\n';
return 0;
}
Attention
When building an index for a single text over any alphabet, the symbol with rank 255 is reserved and may not occur in the text.

Here is an example using a collection of strings (e.g. a genome with multiple chromosomes or a protein database):

#include <vector>
using namespace seqan3;
int main()
{
std::vector<std::vector<dna4>> genomes{"ATCTGACGAAGGCTAGCTAGCTAAGGGA"_dna4,
"TAGCTGAAGCCATTGGCATCTGATCGGACT"_dna4,
"ACTGAGCTCGTC"_dna4,
"TGCATGCACCCATCGACTGACTG"_dna4,
"GTACGTACGTTACG"_dna4};
fm_index index{genomes}; // build the index
auto cur = index.begin(); // create a cursor
cur.extend_right("CTGA"_dna4); // search the pattern "CTGA"
debug_stream << "Number of hits: " << cur.count() << '\n'; // outputs: 5
debug_stream << "Positions in the genomes: ";
for (auto const & pos : cur.locate()) // outputs: (3,16) (2,1) (1,3) (0,2) (1,19)
debug_stream << pos << ' ';
debug_stream << '\n';
return 0;
}
Attention
When building an index for a text collection over any alphabet, the symbols with rank 254 and 255 are reserved and may not be used in the text.

Constructor & Destructor Documentation

◆ fm_index()

template<bool is_collection = false, detail::SdslIndex sdsl_index_type_ = default_sdsl_index_type>
template<std::ranges::Range text_t>
seqan3::fm_index< is_collection, sdsl_index_type_ >::fm_index ( text_t &&  text)
inline

Constructor that immediately constructs the index given a range. The range cannot be empty.

Template Parameters
text_tThe type of range to construct from; must model std::ranges::BidirectionalRange.
Parameters
[in]textThe text to construct from.

Complexity

At least linear.

Member Function Documentation

◆ begin()

template<bool is_collection = false, detail::SdslIndex sdsl_index_type_ = default_sdsl_index_type>
cursor_type seqan3::fm_index< is_collection, sdsl_index_type_ >::begin ( ) const
inlinenoexcept

Returns a seqan3::fm_index_cursor on the index that can be used for searching.

Returns
Returns a (unidirectional) seqan3::fm_index_cursor on the index.

Complexity

Constant.

Exceptions

No-throw guarantee.

◆ construct() [1/2]

template<bool is_collection = false, detail::SdslIndex sdsl_index_type_ = default_sdsl_index_type>
template<std::ranges::Range text_t>
void seqan3::fm_index< is_collection, sdsl_index_type_ >::construct ( text_t &&  text)
inline

Constructs the index given a range. The range cannot be an rvalue (i.e. a temporary object) and has to be non-empty.

Template Parameters
text_tThe type of range to construct from; must model std::ranges::BidirectionalRange.
Parameters
[in]textThe text to construct from.

Complexity

At least linear.

Exceptions

No guarantee.

◆ construct() [2/2]

template<bool is_collection = false, detail::SdslIndex sdsl_index_type_ = default_sdsl_index_type>
template<std::ranges::Range text_t>
void seqan3::fm_index< is_collection, sdsl_index_type_ >::construct ( text_t &&  text)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

◆ empty()

template<bool is_collection = false, detail::SdslIndex sdsl_index_type_ = default_sdsl_index_type>
bool seqan3::fm_index< is_collection, sdsl_index_type_ >::empty ( ) const
inlinenoexcept

Checks whether the index is empty.

Returns
true if the index is empty, false otherwise.

Complexity

Constant.

Exceptions

No-throw guarantee.

◆ operator!=()

template<bool is_collection = false, detail::SdslIndex sdsl_index_type_ = default_sdsl_index_type>
bool seqan3::fm_index< is_collection, sdsl_index_type_ >::operator!= ( fm_index< is_collection, sdsl_index_type_ > const &  rhs) const
inlinenoexcept

Compares two indices.

Returns
true if the indices are unequal, false otherwise.

Complexity

Linear.

Exceptions

No-throw guarantee.

◆ operator==()

template<bool is_collection = false, detail::SdslIndex sdsl_index_type_ = default_sdsl_index_type>
bool seqan3::fm_index< is_collection, sdsl_index_type_ >::operator== ( fm_index< is_collection, sdsl_index_type_ > const &  rhs) const
inlinenoexcept

Compares two indices.

Returns
true if the indices are equal, false otherwise.

Complexity

Linear.

Exceptions

No-throw guarantee.

◆ size()

template<bool is_collection = false, detail::SdslIndex sdsl_index_type_ = default_sdsl_index_type>
size_type seqan3::fm_index< is_collection, sdsl_index_type_ >::size ( ) const
inlinenoexcept

Returns the length of the indexed text including sentinel characters.

Returns
Returns the length of the indexed text including sentinel characters.

Complexity

Constant.

Exceptions

No-throw guarantee.


The documentation for this class was generated from the following file: