SeqAn3  3.0.2
The Modern C++ library for sequence analysis.
Configuration

Data structures and utility functions for configuring search algorithm. More...

+ Collaboration diagram for Configuration:

Classes

struct  seqan3::search_cfg::error_count
 A strong type of underlying type uint8_t that represents the number of errors. More...
 
struct  seqan3::search_cfg::error_rate
 A strong type of underlying type double that represents the rate of errors. More...
 
class  seqan3::search_cfg::hit
 A dynamic configuration element to configure the hit strategy at runtime. More...
 
struct  seqan3::search_cfg::hit_strata
 Configuration element to receive all hits with the best number of errors plus the given stratum. All hits are found with the fewest number of errors plus 'stratum'. More...
 
class  seqan3::search_cfg::max_error_deletion
 Configuration element that represents the number or rate of deletion errors. More...
 
class  seqan3::search_cfg::max_error_insertion
 Configuration element that represents the number or rate of insertion errors. More...
 
class  seqan3::search_cfg::max_error_substitution
 Configuration element that represents the number or rate of substitution errors. More...
 
class  seqan3::search_cfg::max_error_total
 Configuration element that represents the number or rate of total errors. More...
 
struct  seqan3::search_cfg::on_result< callback_t >
 Configuration element to provide a user defined callback function for the search. More...
 

Typedefs

using seqan3::search_cfg::parallel = seqan3::detail::parallel_mode< std::integral_constant< detail::search_config_id, detail::search_config_id::parallel > >
 Enables the parallel execution of the search algorithm if possible for the given configuration. More...
 

Variables

const configuration seqan3::search_cfg::default_configuration
 The default configuration: Compute all exact matches. More...
 
constexpr detail::hit_all_tag seqan3::search_cfg::hit_all
 Configuration element to receive all hits within the error bounds. More...
 
constexpr detail::hit_all_best_tag seqan3::search_cfg::hit_all_best
 Configuration element to receive all hits with the lowest number of errors within the error bounds. More...
 
constexpr detail::hit_single_best_tag seqan3::search_cfg::hit_single_best
 Configuration element to receive a single best hit with the lowest number of errors within the error bounds. More...
 
constexpr detail::output_index_cursor_tag seqan3::search_cfg::output_index_cursor
 Include the index_cursor in the seqan3::search_result returned by a call to seqan3::search. More...
 
constexpr detail::output_query_id_tag seqan3::search_cfg::output_query_id
 Include the query_id in the seqan3::search_result returned by a call to seqan3::search. More...
 
constexpr detail::output_reference_begin_position_tag seqan3::search_cfg::output_reference_begin_position
 Include the reference_begin_position in the seqan3::search_result returned by a call to seqan3::search. More...
 
constexpr detail::output_reference_id_tag seqan3::search_cfg::output_reference_id
 Include the reference_id in the seqan3::search_result returned by a call to seqan3::search. More...
 

Detailed Description

Data structures and utility functions for configuring search algorithm.

See also
Search

Introduction

In SeqAn, the search algorithm uses a configuration object to determine the desired amount of total errors, of substitution errors, of insertion errors, and of deletion errors, where all can be given as an absolute number or a rate of errors. Furthermore, it can be configured what hits are reported based on a strategy, and which information should the result contain. These configurations exist in their own namespace, namely seqan3::search_cfg, to disambiguate them from the configuration of other algorithms.

If no configuration is provided upon invoking the seqan3::search algorithm, a default configuration is provided:

Overview on search configurations

Configurations can be combined using the |-operator. If a combination is invalid, a static assertion is triggered during compilation and will inform the user that the the last config cannot be combined with any of the configs from the left-hand side of the configuration specification. Unfortunately, the names of the invalid types cannot be printed within the static assert, but the following table shows which combinations are possible. In general, the same configuration element cannot occur more than once inside of a configuration specification.

Configuration group 0 1 2 3 4 5 6
0: Max error total
1: Max error substitution
2: Max error insertion
3: Max error deletion
4: Output
5: Hit
6: Parallel

0 - 3: Max Error Configuration

This configuration can be used to specify the number or rate of error types. It restricts the number of substitutions, insertions, deletions and total errors within the search to the given values. A mismatch corresponds to diverging bases between text and query for a certain position. An insertion corresponds to a base inserted into the query that does not occur in the text at the position. A deletion corresponds to a base deleted from the query sequence that does occur in the indexed text. Deletions at the beginning and at the end of the sequence are not considered during a search.

They will behave as follows: If seqan3::search_cfg::total and any other error type are specified, all types are set to the respective values. If one or more other error types are configured, but no total, then total is set to the sum of the error types.

These configuration elements can be given by a number or rate of errors:

seqan3::search_cfg::max_error_*¹ Behaviour
seqan3::search_cfg::error_rate Specify the error rate.
seqan3::search_cfg::error_count Specify a descrete number of allowed errors.

¹: max_error_total, max_error_substitution, max_error_insertion, max_error_deletion

Example

int main()
{
// Allow 1 error of any type.
// Do not allow substitutions. Allow at most 1 error.
// Sets total errors to 2.
// Allow 10% errors of any type.
// Do not allow substitutions. Allow at most 10% errors.
// Sets total errors to 20%.
// Mixed error rate & count: Allow 2 insertions and or 2 deletions and 20% errors in total.
return 0;
}

4. Output Configuration

The output configuration is closely tied to the seqan3::search_result:

Template Parameters
query_id_typeThe type of the query_id; must model std::integral.
cursor_typeThe type of the cursor; must model seqan3::fm_index_cursor_specialisation.
reference_id_typeThe type of the reference_id; must model std::integral.
reference_begin_position_typeThe type of the reference_begin_position; must model std::integral.

The seqan3::search algorithm returns a range of hits. A single hit is stored in a seqan3::search_result. By default, the search result contains the query id, the reference id where the query matched and the begin position in the reference where the query sequence starts to match the reference sequence. Those information can be accessed via the respective member functions.

The following member functions exist:

Note that the index cursor is not included in a hit by default. If you are trying to use the respective member function, a static_assert will prevent you from doing so. You can configure the result of the search with the output configuration.

Configuring the result type

As mentioned above, we can configure which information are accessible in the seqan3::search_result. For each member function there is a respective configuration element:

If you specify any of the above mentioned output configuration elements, then nothing else but the selected output information is included.

int main()
{
// Only return the reference id where a query matched the reference:
// Same as the default:
// Only return cursors of the index.
return 0;
}

The index cursor is an advanced data structure that lets you navigate within the index. See seqan3::fm_index_cursor and seqan3::bi_fm_index_cursor for more information. If you don't need the reference id nor the position, returning only the cursor is faster. This is, because the operation to get the id and position of a hit can be computationally intensive depending on the underlying index structure.

Note
A single index cursor points to a range of text positions. Although the normal use case is to return either the cursor or the positions, both can be returned simultaneously. In this case, the same cursor will be copied into the seqan3::search_result for each of its associated positions.

5: Hit Configuration

This configuration can be used to determine which hits are reported. Currently these strategies are available:

Hit Configurations Behaviour
seqan3::search_cfg::hit_all Report all hits within error bounds.
seqan3::search_cfg::hit_all_best Report all hits with the lowest number of errors within the bounds.
seqan3::search_cfg::hit_single_best Report one best hit (hit with lowest error) within bounds.
seqan3::search_cfg::hit_strata Report all hits within best + stratum errors.

The individual configuration elements to select a search strategy cannot be combined with each other (mutual exclusivity).

int main()
{
// Report all hits with 0 errors (maximum number of errors defaults to 0).
// Report all hits with 0 and 1 errors.
// Report the single best hit with the least number of errors (up to 1 error is allowed).
// Report all hits with the least number of errors (either 0 or 1 errors).
// Report all hits with best + 1 error but no more than 2 (errors).
// E.g., if the best hit has 1 error, all hits with 1 and 2 errors are reported.
// E.g., if the best hit has 2 error, only hits with 2 errors are reported since 3 exceeds total.
// you must choose only one mode
// auto fail = seqan3::search_cfg::hit_single_best | seqan3::search_cfg::hit_all; // doesn't compile
return 0;
}

Dynamic hit configuration

Sometimes a program needs to support different hit strategies based on some user input. Since these are mostly runtime decisisons the code can become quite cumbersome to handle the static hit configurations. Instead, one can use the dynamic hit configuration element seqan3::search::cfg::hit. This configuration element allows to set one of the above mentioned hit configurations at runtime. Later during the configuration phase of the search algorithm the selected search configuration is used for the final search algorithm. If the dynamic hit configuration is default constructed it does not hold any hit configuration. If you call search with the dynamic configuration in this state an exception will be thrown. Also note that using the dynamic configuration might have implications on the compile time, so we recommend to use the static configurations if only a single hit strategy is supported. The following example demonstrates the usage of the dynamic configuration:

int main()
{
// Default constructed: Has no hit strategy selected.
seqan3::search_cfg::hit dynamic_hit{};
// Select hit_all
// If condition is true choose strata strategy, otherwise find the single best hit.
if (true)
else
// Combine it with other configurations.
seqan3::configuration const cfg = dynamic_hit |
// Directly initialised.
// You cannot combine the dynamic hit configuration with the static ones.
// auto fail = seqan3::search_cfg::hit_single_best | seqan3::search_cfg::hit; // doesn't compile
return 0;
}

6: Parallel Configuration

This configuration determines the maximal number of threads the search algorithm can use.

The seqan3::search_cfg::parallel configuration element can be combined with any other search configuration.

int main()
{
// Enable parallel execution of the search algorithm with 8 threads (and allow 1 error of any type).
// Alternative solution: assign to the member variable of the parallel configuration
par_cfg.thread_count = 8;
return 0;
}

User callback

In the default case, a call to seqan3::search returns a lazy range over the results of the search. This lazy range has the advantage that the results are always in a deterministic order even if the search is executed in parallel. Sometimes, however, it might be desirable to provide a user defined callback. To do so, one can use the configuration element seqan3::search_cfg::on_result. This configuration element is initialised with a user defined callback, e.g. a lambda function, which will be invoked with a generated seqan3::search_result whenever a hit was found. This has two implications. First, the return type of the seqan3::search function changes to void, i.e. it returns nothing. Second, in a parallel execution of the search, the order of the hits is not deterministic and the user has to make sure that concurrent invocations of the given callback are safe.

The following snippet demonstrates the basic use case for this configuration element:

#include <vector>
int main()
{
using seqan3::operator""_dna4;
std::vector<seqan3::dna4_vector> genomes{"CGCTGTCTGAAGGATGAGTGTCAGCCAGTGTA"_dna4,
"ACCCGATGAGCTACCCAGTAGTCGAACTG"_dna4,
"GGCCAGACAACCCGGCGCTAATGCACTCA"_dna4};
std::vector<seqan3::dna4_vector> queries{"GCT"_dna4, "ACCC"_dna4};
// build an FM index
seqan3::fm_index index{genomes};
seqan3::configuration const config = seqan3::search_cfg::on_result{[] (auto && result)
{
seqan3::debug_stream << result << '\n';
}};
seqan3::search(queries, index, config); // Does not return anything but calls the lambda from above instead.
// This results in:
// <query_id:0, reference_id:0, reference_pos:1>
// <query_id:0, reference_id:1, reference_pos:9>
// <query_id:0, reference_id:2, reference_pos:16>
// <query_id:1, reference_id:1, reference_pos:0>
// <query_id:1, reference_id:1, reference_pos:12>
// <query_id:1, reference_id:2, reference_pos:9>
}

Typedef Documentation

◆ parallel

using seqan3::search_cfg::parallel = typedef seqan3::detail::parallel_mode<std::integral_constant<detail::search_config_id, detail::search_config_id::parallel> >

Enables the parallel execution of the search algorithm if possible for the given configuration.

With this configuration you can enable the parallel execution of the search algorithm.

The config element takes the number of threads as a parameter, which must be greater than 0.

Example

int main()
{
// Enable parallel execution of the search algorithm with 8 threads (and allow 1 error of any type).
// Alternative solution: assign to the member variable of the parallel configuration
par_cfg.thread_count = 8;
return 0;
}

Variable Documentation

◆ default_configuration

const configuration seqan3::search_cfg::default_configuration
inline
Initial value:
= max_error_total{error_count{0}} |
max_error_substitution{error_count{0}} |
max_error_insertion{error_count{0}} |
max_error_deletion{error_count{0}} |

The default configuration: Compute all exact matches.

◆ hit_all

constexpr detail::hit_all_tag seqan3::search_cfg::hit_all
inlineconstexpr

Configuration element to receive all hits within the error bounds.

See also
Section on Hit Strategy

◆ hit_all_best

constexpr detail::hit_all_best_tag seqan3::search_cfg::hit_all_best
inlineconstexpr

Configuration element to receive all hits with the lowest number of errors within the error bounds.

See also
Section on Hit Strategy

◆ hit_single_best

constexpr detail::hit_single_best_tag seqan3::search_cfg::hit_single_best
inlineconstexpr

Configuration element to receive a single best hit with the lowest number of errors within the error bounds.

See also
Section on Hit Strategy

◆ output_index_cursor

constexpr detail::output_index_cursor_tag seqan3::search_cfg::output_index_cursor
inlineconstexpr

Include the index_cursor in the seqan3::search_result returned by a call to seqan3::search.

See also
Section on Output

◆ output_query_id

constexpr detail::output_query_id_tag seqan3::search_cfg::output_query_id
inlineconstexpr

Include the query_id in the seqan3::search_result returned by a call to seqan3::search.

See also
Section on Output

◆ output_reference_begin_position

constexpr detail::output_reference_begin_position_tag seqan3::search_cfg::output_reference_begin_position
inlineconstexpr

Include the reference_begin_position in the seqan3::search_result returned by a call to seqan3::search.

See also
Section on Output

◆ output_reference_id

constexpr detail::output_reference_id_tag seqan3::search_cfg::output_reference_id
inlineconstexpr

Include the reference_id in the seqan3::search_result returned by a call to seqan3::search.

See also
Section on Output
seqan3::search_cfg::parallel
seqan3::detail::parallel_mode< std::integral_constant< detail::search_config_id, detail::search_config_id::parallel > > parallel
Enables the parallel execution of the search algorithm if possible for the given configuration.
Definition: parallel.hpp:34
debug_stream.hpp
Provides seqan3::debug_stream and related types.
dna4.hpp
Provides seqan3::dna4, container aliases and string literals.
all.hpp
Meta-header for the FM index module.
seqan3::fm_index
The SeqAn FM Index.
Definition: fm_index.hpp:194
seqan3::search_cfg::max_error_substitution
Configuration element that represents the number or rate of substitution errors.
Definition: max_error.hpp:71
parallel.hpp
Provides seqan3::search_cfg::parallel configuration.
vector
configuration.hpp
Provides seqan3::detail::configuration and utility functions.
seqan3::search_cfg::max_error_total
Configuration element that represents the number or rate of total errors.
Definition: max_error.hpp:36
seqan3::search_cfg::output_query_id
constexpr detail::output_query_id_tag output_query_id
Include the query_id in the seqan3::search_result returned by a call to seqan3::search.
Definition: output.hpp:117
seqan3::search_cfg::max_error_deletion
Configuration element that represents the number or rate of deletion errors.
Definition: max_error.hpp:138
seqan3::search_cfg::max_error_insertion
Configuration element that represents the number or rate of insertion errors.
Definition: max_error.hpp:104
seqan3::search_cfg::output_index_cursor
constexpr detail::output_index_cursor_tag output_index_cursor
Include the index_cursor in the seqan3::search_result returned by a call to seqan3::search.
Definition: output.hpp:138
max_error.hpp
Provides the configuration for maximum number of errors for all error types.
seqan3::search_cfg::error_rate
A strong type of underlying type double that represents the rate of errors.
Definition: max_error_common.hpp:44
seqan3::configuration
Collection of elements to configure an algorithm.
Definition: configuration.hpp:82
seqan3::search_cfg::hit
A dynamic configuration element to configure the hit strategy at runtime.
Definition: hit.hpp:148
seqan3::search_cfg::output_reference_id
constexpr detail::output_reference_id_tag output_reference_id
Include the reference_id in the seqan3::search_result returned by a call to seqan3::search.
Definition: output.hpp:124
seqan3::search_cfg::error_count
A strong type of underlying type uint8_t that represents the number of errors.
Definition: max_error_common.hpp:30
search.hpp
Provides the public interface for search algorithms.
seqan3::search_cfg::hit_strata
Configuration element to receive all hits with the best number of errors plus the given stratum....
Definition: hit.hpp:123
seqan3::search_cfg::output_reference_begin_position
constexpr detail::output_reference_begin_position_tag output_reference_begin_position
Include the reference_begin_position in the seqan3::search_result returned by a call to seqan3::searc...
Definition: output.hpp:131
seqan3::search_cfg::on_result
Configuration element to provide a user defined callback function for the search.
Definition: on_result.hpp:55
all.hpp
Meta-header for the search configuration module .
hit.hpp
Provides the configuration to define the hit strategies "hit_strata", "hit_all", "hit_all_best",...
seqan3::debug_stream
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition: debug_stream.hpp:42
seqan3::search
auto search(queries_t &&queries, index_t const &index, configuration_t const &cfg=search_cfg::default_configuration)
Search a query or a range of queries in an index.
Definition: search.hpp:108
seqan3::search_cfg::hit_single_best
constexpr detail::hit_single_best_tag hit_single_best
Configuration element to receive a single best hit with the lowest number of errors within the error ...
Definition: hit.hpp:115
seqan3::search_cfg::hit_all
constexpr detail::hit_all_tag hit_all
Configuration element to receive all hits within the error bounds.
Definition: hit.hpp:101
seqan3::search_cfg::hit_all_best
constexpr detail::hit_all_best_tag hit_all_best
Configuration element to receive all hits with the lowest number of errors within the error bounds.
Definition: hit.hpp:108
output.hpp
Provides the configuration for the content of the search result.