HIBF 1.0.0-rc.1
|
Distributes x Technical Bins across y User Bins while minimizing the maximal Technical Bin size. More...
#include <hibf/layout/simple_binning.hpp>
Public Member Functions | |
size_t | execute () |
Executes the simple binning algorithm and layouts user bins into technical bins. | |
size_t | get_num_technical_bins () const |
simple_binning & | operator= (simple_binning &&)=default |
Defaulted. | |
simple_binning & | operator= (simple_binning const &)=delete |
Deleted. Would modify same data. | |
simple_binning ()=default | |
Defaulted. | |
simple_binning (data_store &data_, size_t const num_bins=0) | |
The constructor from user bin names, their kmer counts and a configuration. | |
simple_binning (simple_binning &&)=default | |
Defaulted. | |
simple_binning (simple_binning const &)=delete | |
Deleted. Would modify same data. | |
~simple_binning ()=default | |
Defaulted. | |
Distributes x Technical Bins across y User Bins while minimizing the maximal Technical Bin size.
A Technical Bin represents an actual bin in the binning directory. In the IBF, it stores its kmers in a single Bloom Filter (which is interleaved with all the other BFs).
The user may impose a structure on his sequence data in the form of logical groups (e.g. species). When querying the IBF, the user is interested in an answer that differentiates between these groups.
Name | Description |
---|---|
x | Number of Technical Bins (TB) |
y | Number of User Bins (UB) |
b_i | The bin size (kmer content) of Technical Bin |
c_j | The kmer content of User Bin |
M | A DP matrix that tracks the maximum technical bin size |
Since the size of the IBF depends on the maximal Technical Bin size, we want to minimize
Let
Assume we filled a trace matrix T during the computation of M.
We now want to recover the number of bins n_j for each User Bin j.
Backtracking pseudo code:
|
inline |
The constructor from user bin names, their kmer counts and a configuration.
[in] | data_ | Stores all data that is needed to compute the layout. |
[in] | num_bins | (optional) The number of technical bins. |
If the num_bins
parameter is omitted or set to 0, then number of technical bins used in this algorithm is automatically set to the next multiple of 64 given the number of user bins (e.g. #UB = 88 -> #TB = 124).