SeqAn3 3.4.0-rc.1
The Modern C++ library for sequence analysis.
|
Data structures and utility functions for configuring search algorithm. More...
Classes | |
struct | seqan3::search_cfg::error_count |
A strong type of underlying type uint8_t that represents the number of errors. More... | |
struct | seqan3::search_cfg::error_rate |
A strong type of underlying type double that represents the rate of errors. More... | |
class | seqan3::search_cfg::hit |
A dynamic configuration element to configure the hit strategy at runtime. More... | |
class | seqan3::search_cfg::hit_all |
Configuration element to receive all hits within the error bounds. More... | |
class | seqan3::search_cfg::hit_all_best |
Configuration element to receive all hits with the lowest number of errors within the error bounds. More... | |
class | seqan3::search_cfg::hit_single_best |
Configuration element to receive a single best hit with the lowest number of errors within the error bounds. More... | |
class | seqan3::search_cfg::hit_strata |
Configuration element to receive all hits with the best number of errors plus the given stratum. All hits are found with the fewest number of errors plus 'stratum'. More... | |
class | seqan3::search_cfg::max_error_deletion |
Configuration element that represents the number or rate of deletion errors. More... | |
class | seqan3::search_cfg::max_error_insertion |
Configuration element that represents the number or rate of insertion errors. More... | |
class | seqan3::search_cfg::max_error_substitution |
Configuration element that represents the number or rate of substitution errors. More... | |
class | seqan3::search_cfg::max_error_total |
Configuration element that represents the number or rate of total errors. More... | |
class | seqan3::search_cfg::on_result< callback_t > |
Configuration element to provide a user defined callback function for the search. More... | |
class | seqan3::search_cfg::output_index_cursor |
Include the index_cursor in the seqan3::search_result returned by a call to seqan3::search. More... | |
class | seqan3::search_cfg::output_query_id |
Include the query_id in the seqan3::search_result returned by a call to seqan3::search. More... | |
class | seqan3::search_cfg::output_reference_begin_position |
Include the reference_begin_position in the seqan3::search_result returned by a call to seqan3::search. More... | |
class | seqan3::search_cfg::output_reference_id |
Include the reference_id in the seqan3::search_result returned by a call to seqan3::search. More... | |
class | seqan3::search_cfg::detail::result_type< search_result_t > |
Configuration element storing the configured seqan3::search_result for the search algorithm. More... | |
Typedefs | |
using | seqan3::search_cfg::parallel = seqan3::detail::parallel_mode< std::integral_constant< seqan3::detail::search_config_id, seqan3::detail::search_config_id::parallel > > |
Enables the parallel execution of the search algorithm if possible for the given configuration. | |
Variables | |
constexpr configuration | seqan3::search_cfg::default_configuration |
The default configuration: Compute all exact matches. | |
Data structures and utility functions for configuring search algorithm.
In SeqAn, the search algorithm uses a configuration object to determine the desired amount of total errors, of substitution errors, of insertion errors, and of deletion errors, where all can be given as an absolute number or a rate of errors. Furthermore, it can be configured what hits are reported based on a strategy, and which information should the result contain. These configurations exist in their own namespace, namely seqan3::search_cfg, to disambiguate them from the configuration of other algorithms.
If no configuration is provided upon invoking the seqan3::search algorithm, a default configuration is provided:
Configurations can be combined using the |
-operator. If a combination is invalid, a static assertion is raised during the compilation of the program. It will inform the user that some configurations cannot be combined together into one search configuration. In general, the same configuration element cannot occur more than once inside of a configuration specification. The following table shows which combinations are possible.
Configuration group | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
0: Max error total | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
1: Max error substitution | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
2: Max error insertion | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
3: Max error deletion | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
4: Output | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
5: Hit | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
6: Parallel | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
This configuration can be used to specify the number or rate of error types. It restricts the number of substitutions, insertions, deletions and total errors within the search to the given values. A mismatch corresponds to diverging bases between text and query for a certain position. An insertion corresponds to a base inserted into the query that does not occur in the text at the respective position. A deletion corresponds to a base deleted from the query sequence that does occur in the indexed text. Deletions at the beginning and at the end of the sequence are not considered during a search.
The following rules apply when selecting the max error configuration: First, if seqan3::search_cfg::max_error_total is specified, then all error types are set to the value of the total error configuration. For any other specified error configuration the value is set accordingly, but will not exceed the total error if given. For example, if a configuration profile sets the total max error to 3 and the insertion error to 1, then the search will at most consider one insertion but allow up to 3 deletions and 3 substitutions during the search, while allowing at most 3 errors in total. On the other hand, if the total error is not specified in the search configuration, it will be set to the sum of the other configurations. This means that in the default case all errors are set to 0 and therefore an exact search is conducted.
The configuration elements can be initialised by an absolute error count or an error rate:
seqan3::search_cfg::max_error_*¹ | Behaviour |
---|---|
seqan3::search_cfg::error_rate | Specify the error rate ( \(\in [0,1]\)). |
seqan3::search_cfg::error_count | Specify a discrete number of allowed errors ( \(\mathbb{W}\)). |
¹: max_error_total, max_error_substitution, max_error_insertion, max_error_deletion
The seqan3::search interface returns a seqan3::algorithm_result_generator_range. This range is a lazy single pass input range over the computed hits and the range's element types are seqan3::search_result objects. Even if only a single query is searched, a range will be returned since it could be possible that one search produces multiple hits, e.g. to find all best hits. The following output configurations exists:
Each of the configuration elements corresponds to a member access function inside of the returned seqan3::search_result object. If you do not specify any output configuration in the search configuration, then the default output contains the query and reference id as well as the reference begin position. If you customise the output configuration, then only those are available in the final seqan3::search_result that are specified.
The index cursor is an advanced data structure that lets you navigate within the index. See seqan3::fm_index_cursor and seqan3::bi_fm_index_cursor for more information. If you don't need the reference id nor the position, returning only the cursor is faster. This is, because the operation to get the id and position of a hit can be computationally intensive depending on the underlying index structure.
This configuration can be used to determine which hits are reported. Currently these strategies are available:
Hit Configurations | Behaviour |
---|---|
seqan3::search_cfg::hit_all | Report all hits within error bounds. |
seqan3::search_cfg::hit_all_best | Report all hits with the lowest number of errors within the bounds. |
seqan3::search_cfg::hit_single_best | Report one best hit (hit with lowest error) within bounds. |
seqan3::search_cfg::hit_strata | Report all hits within best + stratum errors. |
The individual configuration elements to select a search strategy cannot be combined with each other (mutual exclusivity).
Sometimes a program needs to support different hit strategies based on some user input. Since these are mostly runtime decisisons the code can become quite cumbersome to handle the static hit configurations. Instead, one can use the dynamic hit configuration element seqan3::search_cfg::hit. This configuration element allows to set one of the above mentioned hit configurations at runtime. Later during the configuration phase of the search algorithm the selected search configuration is used for the final search algorithm. If the dynamic hit configuration is default constructed it does not hold any hit configuration. If you call search with the dynamic configuration in this state an exception will be thrown. Also note that using the dynamic configuration might have implications on the compile time, so we recommend to use the static configurations if only a single hit strategy is supported. The following example demonstrates the usage of the dynamic configuration:
This configuration determines the maximal number of threads the search algorithm can use.
The seqan3::search_cfg::parallel configuration element can be combined with any other search configuration.
In the default case, a call to seqan3::search returns a lazy range over the results of the search. This lazy range has the advantage that the results are always in a deterministic order even if the search is executed in parallel. Sometimes, however, it might be desirable to provide a user defined callback. To do so, one can use the configuration element seqan3::search_cfg::on_result. This configuration element is initialised with a user defined callback, e.g. a lambda function, which will be invoked with a generated seqan3::search_result whenever a hit was found. This has two implications. First, the return type of the seqan3::search function changes to void
, i.e. it returns nothing. Second, in a parallel execution of the search, the order of the hits is not deterministic and the user has to make sure that concurrent invocations of the given callback are safe.
The following snippet demonstrates the basic use case for this configuration element:
using seqan3::search_cfg::parallel = typedef seqan3::detail::parallel_mode< std::integral_constant<seqan3::detail::search_config_id, seqan3::detail::search_config_id::parallel> > |
Enables the parallel execution of the search algorithm if possible for the given configuration.
With this configuration you can enable the parallel execution of the search algorithm.
The config element takes the number of threads as a parameter, which must be greater than 0
.
|
constexpr |
The default configuration: Compute all exact matches.