SeqAn3 3.3.0-rc.1
The Modern C++ library for sequence analysis.
Configuration

Data structures and utility functions for configuring search algorithm. More...

+ Collaboration diagram for Configuration:

Classes

struct  seqan3::search_cfg::error_count
 A strong type of underlying type uint8_t that represents the number of errors. More...
 
struct  seqan3::search_cfg::error_rate
 A strong type of underlying type double that represents the rate of errors. More...
 
class  seqan3::search_cfg::hit
 A dynamic configuration element to configure the hit strategy at runtime. More...
 
class  seqan3::search_cfg::hit_all
 Configuration element to receive all hits within the error bounds. More...
 
class  seqan3::search_cfg::hit_all_best
 Configuration element to receive all hits with the lowest number of errors within the error bounds. More...
 
class  seqan3::search_cfg::hit_single_best
 Configuration element to receive a single best hit with the lowest number of errors within the error bounds. More...
 
class  seqan3::search_cfg::hit_strata
 Configuration element to receive all hits with the best number of errors plus the given stratum. All hits are found with the fewest number of errors plus 'stratum'. More...
 
class  seqan3::search_cfg::max_error_deletion
 Configuration element that represents the number or rate of deletion errors. More...
 
class  seqan3::search_cfg::max_error_insertion
 Configuration element that represents the number or rate of insertion errors. More...
 
class  seqan3::search_cfg::max_error_substitution
 Configuration element that represents the number or rate of substitution errors. More...
 
class  seqan3::search_cfg::max_error_total
 Configuration element that represents the number or rate of total errors. More...
 
class  seqan3::search_cfg::on_result< callback_t >
 Configuration element to provide a user defined callback function for the search. More...
 
class  seqan3::search_cfg::output_index_cursor
 Include the index_cursor in the seqan3::search_result returned by a call to seqan3::search. More...
 
class  seqan3::search_cfg::output_query_id
 Include the query_id in the seqan3::search_result returned by a call to seqan3::search. More...
 
class  seqan3::search_cfg::output_reference_begin_position
 Include the reference_begin_position in the seqan3::search_result returned by a call to seqan3::search. More...
 
class  seqan3::search_cfg::output_reference_id
 Include the reference_id in the seqan3::search_result returned by a call to seqan3::search. More...
 

Typedefs

using seqan3::search_cfg::parallel = seqan3::detail::parallel_mode< std::integral_constant< detail::search_config_id, detail::search_config_id::parallel > >
 Enables the parallel execution of the search algorithm if possible for the given configuration. More...
 

Variables

constexpr configuration seqan3::search_cfg::default_configuration
 The default configuration: Compute all exact matches. More...
 

Detailed Description

Data structures and utility functions for configuring search algorithm.

See also
Search

Introduction

In SeqAn, the search algorithm uses a configuration object to determine the desired amount of total errors, of substitution errors, of insertion errors, and of deletion errors, where all can be given as an absolute number or a rate of errors. Furthermore, it can be configured what hits are reported based on a strategy, and which information should the result contain. These configurations exist in their own namespace, namely seqan3::search_cfg, to disambiguate them from the configuration of other algorithms.

If no configuration is provided upon invoking the seqan3::search algorithm, a default configuration is provided:

int main()
{
auto const zero_errors = seqan3::search_cfg::error_count{0};
// No errors, all hits as text position
seqan3::configuration const default_cfg =
return 0;
}
Collection of elements to configure an algorithm.
Definition: configuration.hpp:45
Configuration element to receive all hits within the error bounds.
Definition: hit.hpp:34
Configuration element that represents the number or rate of deletion errors.
Definition: max_error.hpp:173
Configuration element that represents the number or rate of insertion errors.
Definition: max_error.hpp:127
Configuration element that represents the number or rate of substitution errors.
Definition: max_error.hpp:82
Configuration element that represents the number or rate of total errors.
Definition: max_error.hpp:37
Include the query_id in the seqan3::search_result returned by a call to seqan3::search.
Definition: search/configuration/output.hpp:31
Include the reference_begin_position in the seqan3::search_result returned by a call to seqan3::searc...
Definition: search/configuration/output.hpp:81
Include the reference_id in the seqan3::search_result returned by a call to seqan3::search.
Definition: search/configuration/output.hpp:56
Provides the configuration to define the hit strategies "hit_strata", "hit_all", "hit_all_best",...
Provides the configuration for maximum number of errors for all error types.
Provides the configuration for the content of the search result.
A strong type of underlying type uint8_t that represents the number of errors.
Definition: max_error_common.hpp:32

Overview on search configurations

Configurations can be combined using the |-operator. If a combination is invalid, a static assertion is raised during the compilation of the program. It will inform the user that some configurations cannot be combined together into one search configuration. In general, the same configuration element cannot occur more than once inside of a configuration specification. The following table shows which combinations are possible.

Configuration group 0 1 2 3 4 5 6
0: Max error total
1: Max error substitution
2: Max error insertion
3: Max error deletion
4: Output
5: Hit
6: Parallel

0 - 3: Max Error Configuration

This configuration can be used to specify the number or rate of error types. It restricts the number of substitutions, insertions, deletions and total errors within the search to the given values. A mismatch corresponds to diverging bases between text and query for a certain position. An insertion corresponds to a base inserted into the query that does not occur in the text at the respective position. A deletion corresponds to a base deleted from the query sequence that does occur in the indexed text. Deletions at the beginning and at the end of the sequence are not considered during a search.

The following rules apply when selecting the max error configuration: First, if seqan3::search_cfg::max_error_total is specified, then all error types are set to the value of the total error configuration. For any other specified error configuration the value is set accordingly, but will not exceed the total error if given. For example, if a configuration profile sets the total max error to 3 and the insertion error to 1, then the search will at most consider one insertion but allow up to 3 deletions and 3 substitutions during the search, while allowing at most 3 errors in total. On the other hand, if the total error is not specified in the search configuration, it will be set to the sum of the other configurations. This means that in the default case all errors are set to 0 and therefore an exact search is conducted.

The configuration elements can be initialised by an absolute error count or an error rate:

seqan3::search_cfg::max_error_*¹ Behaviour
seqan3::search_cfg::error_rate Specify the error rate ( $\in [0,1]$).
seqan3::search_cfg::error_count Specify a discrete number of allowed errors ( $\mathbb{W}$).

¹: max_error_total, max_error_substitution, max_error_insertion, max_error_deletion

Example

int main()
{
// Allow 1 error of any type.
// Do not allow substitutions. Allow at most 1 error.
// Sets total errors to 2.
// Allow 10% errors of any type.
// Do not allow substitutions. Allow at most 10% errors.
// Sets total errors to 20%.
// Mixed error rate & count: Allow 2 insertions and or 2 deletions and 20% errors in total.
return 0;
}
Provides seqan3::configuration and utility functions.
A strong type of underlying type double that represents the rate of errors.
Definition: max_error_common.hpp:46

4. Output Configuration

The seqan3::search interface returns a seqan3::algorithm_result_generator_range. This range is a lazy single pass input range over the computed hits and the range's element types are seqan3::search_result objects. Even if only a single query is searched, a range will be returned since it could be possible that one search produces multiple hits, e.g. to find all best hits. The following output configurations exists:

Each of the configuration elements corresponds to a member access function inside of the returned seqan3::search_result object. If you do not specify any output configuration in the search configuration, then the default output contains the query and reference id as well as the reference begin position. If you customise the output configuration, then only those are available in the final seqan3::search_result that are specified.

Note
If you try to call a function for an entity that was not configured a static assertion will be raised during compilation of the program.
int main()
{
// Only return the reference id where a query matched the reference:
// Same as the default:
// Only return cursors of the index.
return 0;
}
Include the index_cursor in the seqan3::search_result returned by a call to seqan3::search.
Definition: search/configuration/output.hpp:107

The index cursor is an advanced data structure that lets you navigate within the index. See seqan3::fm_index_cursor and seqan3::bi_fm_index_cursor for more information. If you don't need the reference id nor the position, returning only the cursor is faster. This is, because the operation to get the id and position of a hit can be computationally intensive depending on the underlying index structure.

Note
A single index cursor points to a range of text positions. Although the normal use case is to return either the cursor or the positions, both can be returned simultaneously. In this case, the same cursor will be copied into the seqan3::search_result for each of its associated positions.

5: Hit Configuration

This configuration can be used to determine which hits are reported. Currently these strategies are available:

Hit Configurations Behaviour
seqan3::search_cfg::hit_all Report all hits within error bounds.
seqan3::search_cfg::hit_all_best Report all hits with the lowest number of errors within the bounds.
seqan3::search_cfg::hit_single_best Report one best hit (hit with lowest error) within bounds.
seqan3::search_cfg::hit_strata Report all hits within best + stratum errors.

The individual configuration elements to select a search strategy cannot be combined with each other (mutual exclusivity).

int main()
{
// Report all hits with 0 errors (maximum number of errors defaults to 0).
// Report all hits with 0 and 1 errors.
seqan3::configuration const cfg2 =
// Report the single best hit with the least number of errors (up to 1 error is allowed).
seqan3::configuration const cfg3 =
// Report all hits with the least number of errors (either 0 or 1 errors).
seqan3::configuration const cfg4 =
// Report all hits with best + 1 error but no more than 2 (errors).
// E.g., if the best hit has 1 error, all hits with 1 and 2 errors are reported.
// E.g., if the best hit has 2 error, only hits with 2 errors are reported since 3 exceeds total.
seqan3::configuration const cfg5 =
// you must choose only one mode
// auto fail = seqan3::search_cfg::hit_single_best{} | seqan3::search_cfg::hit_all{}; // doesn't compile
return 0;
}
Configuration element to receive all hits with the lowest number of errors within the error bounds.
Definition: hit.hpp:59
Configuration element to receive a single best hit with the lowest number of errors within the error ...
Definition: hit.hpp:84
Configuration element to receive all hits with the best number of errors plus the given stratum....
Definition: hit.hpp:110

Dynamic hit configuration

Sometimes a program needs to support different hit strategies based on some user input. Since these are mostly runtime decisisons the code can become quite cumbersome to handle the static hit configurations. Instead, one can use the dynamic hit configuration element seqan3::search_cfg::hit. This configuration element allows to set one of the above mentioned hit configurations at runtime. Later during the configuration phase of the search algorithm the selected search configuration is used for the final search algorithm. If the dynamic hit configuration is default constructed it does not hold any hit configuration. If you call search with the dynamic configuration in this state an exception will be thrown. Also note that using the dynamic configuration might have implications on the compile time, so we recommend to use the static configurations if only a single hit strategy is supported. The following example demonstrates the usage of the dynamic configuration:

int main()
{
// Default constructed: Has no hit strategy selected.
seqan3::search_cfg::hit dynamic_hit{};
// Select hit_all
dynamic_hit = seqan3::search_cfg::hit_all{};
// If condition is true choose strata strategy, otherwise find the single best hit.
if (true)
else
// Combine it with other configurations.
// Directly initialised.
// You cannot combine the dynamic hit configuration with the static ones.
// auto fail = seqan3::search_cfg::hit_single_best{} | seqan3::search_cfg::hit; // doesn't compile
return 0;
}
A dynamic configuration element to configure the hit strategy at runtime.
Definition: hit.hpp:143
Meta-header for the search configuration module .

6: Parallel Configuration

This configuration determines the maximal number of threads the search algorithm can use.

The seqan3::search_cfg::parallel configuration element can be combined with any other search configuration.

int main()
{
// Enable parallel execution of the search algorithm with 8 threads (and allow 1 error of any type).
// Alternative solution: assign to the member variable of the parallel configuration
par_cfg.thread_count = 8;
return 0;
}
seqan3::detail::parallel_mode< std::integral_constant< detail::search_config_id, detail::search_config_id::parallel > > parallel
Enables the parallel execution of the search algorithm if possible for the given configuration.
Definition: parallel.hpp:35
Provides seqan3::search_cfg::parallel configuration.

User callback

In the default case, a call to seqan3::search returns a lazy range over the results of the search. This lazy range has the advantage that the results are always in a deterministic order even if the search is executed in parallel. Sometimes, however, it might be desirable to provide a user defined callback. To do so, one can use the configuration element seqan3::search_cfg::on_result. This configuration element is initialised with a user defined callback, e.g. a lambda function, which will be invoked with a generated seqan3::search_result whenever a hit was found. This has two implications. First, the return type of the seqan3::search function changes to void, i.e. it returns nothing. Second, in a parallel execution of the search, the order of the hits is not deterministic and the user has to make sure that concurrent invocations of the given callback are safe.

The following snippet demonstrates the basic use case for this configuration element:

#include <vector>
int main()
{
using namespace seqan3::literals;
std::vector<seqan3::dna4_vector> genomes{"CGCTGTCTGAAGGATGAGTGTCAGCCAGTGTA"_dna4,
"ACCCGATGAGCTACCCAGTAGTCGAACTG"_dna4,
"GGCCAGACAACCCGGCGCTAATGCACTCA"_dna4};
std::vector<seqan3::dna4_vector> queries{"GCT"_dna4, "ACCC"_dna4};
// build an FM index
seqan3::fm_index index{genomes};
seqan3::configuration const config = seqan3::search_cfg::on_result{[](auto && result)
{
seqan3::debug_stream << result << '\n';
}};
seqan3::search(queries, index, config); // Does not return anything but calls the lambda from above instead.
// This results in:
// <query_id:0, reference_id:0, reference_pos:1>
// <query_id:0, reference_id:1, reference_pos:9>
// <query_id:0, reference_id:2, reference_pos:16>
// <query_id:1, reference_id:1, reference_pos:0>
// <query_id:1, reference_id:1, reference_pos:12>
// <query_id:1, reference_id:2, reference_pos:9>
}
The SeqAn FM Index.
Definition: fm_index.hpp:189
Configuration element to provide a user defined callback function for the search.
Definition: on_result.hpp:55
Provides seqan3::debug_stream and related types.
Provides seqan3::dna4, container aliases and string literals.
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition: debug_stream.hpp:37
auto search(queries_t &&queries, index_t const &index, configuration_t const &cfg=search_cfg::default_configuration)
Search a query or a range of queries in an index.
Definition: search.hpp:103
The SeqAn namespace for literals.
Meta-header for the Search / FM Index submodule .
Provides the public interface for search algorithms.

Typedef Documentation

◆ parallel

using seqan3::search_cfg::parallel = typedef seqan3::detail::parallel_mode<std::integral_constant<detail::search_config_id, detail::search_config_id::parallel> >

Enables the parallel execution of the search algorithm if possible for the given configuration.

See also
Configuration

With this configuration you can enable the parallel execution of the search algorithm.

The config element takes the number of threads as a parameter, which must be greater than 0.

Example

int main()
{
// Enable parallel execution of the search algorithm with 8 threads (and allow 1 error of any type).
// Alternative solution: assign to the member variable of the parallel configuration
par_cfg.thread_count = 8;
return 0;
}

Variable Documentation

◆ default_configuration

constexpr configuration seqan3::search_cfg::default_configuration
constexpr
Initial value:
= max_error_total{error_count{0}} | max_error_substitution{error_count{0}}
| max_error_insertion{error_count{0}} | max_error_deletion{error_count{0}}
| output_query_id{} | output_reference_id{}
| output_reference_begin_position{} | hit_all{}

The default configuration: Compute all exact matches.

See also
Configuration