HIBF 1.0.0-rc.1
All Classes Namespaces Files Functions Variables Typedefs Friends Macros Modules Pages Concepts
seqan::hibf::layout::simple_binning Class Reference

Distributes x Technical Bins across y User Bins while minimizing the maximal Technical Bin size. More...

#include <hibf/layout/simple_binning.hpp>

Public Member Functions

size_t execute ()
 Executes the simple binning algorithm and layouts user bins into technical bins.
 
size_t get_num_technical_bins () const
 
simple_binningoperator= (simple_binning &&)=default
 Defaulted.
 
simple_binningoperator= (simple_binning const &)=delete
 Deleted. Would modify same data.
 
 simple_binning ()=default
 Defaulted.
 
 simple_binning (data_store &data_, size_t const num_bins=0)
 The constructor from user bin names, their kmer counts and a configuration.
 
 simple_binning (simple_binning &&)=default
 Defaulted.
 
 simple_binning (simple_binning const &)=delete
 Deleted. Would modify same data.
 
 ~simple_binning ()=default
 Defaulted.
 

Detailed Description

Distributes x Technical Bins across y User Bins while minimizing the maximal Technical Bin size.

Terminology

Technical Bin

A Technical Bin represents an actual bin in the binning directory. In the IBF, it stores its kmers in a single Bloom Filter (which is interleaved with all the other BFs).

User Bin

The user may impose a structure on his sequence data in the form of logical groups (e.g. species). When querying the IBF, the user is interested in an answer that differentiates between these groups.

Notation

Name Description
x Number of Technical Bins (TB)
y Number of User Bins (UB)
b_i The bin size (kmer content) of Technical Bin i
c_j The kmer content of User Bin j
M A DP matrix that tracks the maximum technical bin size maxi(bi).
Attention
The number of technical bins x must be greater that the number of user bins y for this algorithm. If you want to use less technical bins than user bins, see the hierarchical_binning algorithm.

Algorithm

Since the size of the IBF depends on the maximal Technical Bin size, we want to minimize maxi(bi).

Let r=xy be the surplus of TBs

Initialization



i[0,r]Mi,0=c0i+1









Recursion



i,jMi,j=mini[ir1,i1]max(Mi,j1,cjii)









Backtracking

Assume we filled a trace matrix T during the computation of M.

We now want to recover the number of bins n_j for each User Bin j.

Backtracking pseudo code:

// Start at the bottom-right cell.
j = y - 1;
i = x - 1;
n = array(y); // array of length y
while (j > 0)
{
next_i = T[i][j];
n[j] = i - next_i;
i = next_i;
j = j - 1;
}

Constructor & Destructor Documentation

◆ simple_binning()

seqan::hibf::layout::simple_binning::simple_binning ( data_store & data_,
size_t const num_bins = 0 )
inline

The constructor from user bin names, their kmer counts and a configuration.

Parameters
[in]data_Stores all data that is needed to compute the layout.
[in]num_bins(optional) The number of technical bins.

If the num_bins parameter is omitted or set to 0, then number of technical bins used in this algorithm is automatically set to the next multiple of 64 given the number of user bins (e.g. #UB = 88 -> #TB = 124).

Attention
The number of technical bins must be greater or equal to the number of user bins! If you want to use less technical bins than user bins, see the hierarchical_binning algorithm.

The documentation for this class was generated from the following file: