HIBF 1.0.0-rc.1
All Classes Namespaces Files Functions Variables Typedefs Friends Macros Modules Pages Concepts
Toolbox

Algorithms for union estimation and similarity rearrangement executed before computing the layout. More...

Classes

struct  seqan::hibf::sketch::toolbox::clustering_node
 type for a node in the clustering tree when for the rearrangement More...
 
struct  seqan::hibf::sketch::toolbox::entry
 entry of the distance matrix that has the id of a cluster with its neighbors in a prio queue More...
 
struct  seqan::hibf::sketch::toolbox::neighbor
 element of the second priority queue layer of the distance matrix More...
 

Typedefs

using seqan::hibf::sketch::toolbox::distance_matrix = std::vector<entry>
 type of the distance matrix for the clustering for the rearrangement
 
using seqan::hibf::sketch::toolbox::prio_queue = std::priority_queue<neighbor, std::vector<neighbor>, std::greater<neighbor>>
 type of a min heap based priority queue
 

Functions

void seqan::hibf::sketch::toolbox::cluster_bins (std::vector< hyperloglog > const &sketches, std::vector< size_t > &positions, std::vector< size_t > &permutation, size_t const first, size_t const last, size_t const num_threads)
 Perform an agglomerative clustering variant on the index range [first:last)
 
void seqan::hibf::sketch::toolbox::precompute_initial_union_estimates (std::vector< uint64_t > &estimates, std::vector< hyperloglog > const &sketches, std::vector< size_t > const &counts, std::vector< size_t > const &positions)
 Estimate the cardinality of the union for each interval [0, j] for all user bins j.
 
void seqan::hibf::sketch::toolbox::precompute_union_estimates_for (std::vector< uint64_t > &estimates, std::vector< hyperloglog > const &sketches, std::vector< size_t > const &counts, std::vector< size_t > const &positions, int64_t const j)
 Estimate the cardinality of the union for a single user bin j with all prior ones j' < j.
 
void seqan::hibf::sketch::toolbox::prune (distance_matrix &dist, robin_hood::unordered_flat_map< size_t, size_t > &remaining_ids)
 Delete inactive entries out of dist and shrink to fit its size while keeping track of the changes of indices.
 
void seqan::hibf::sketch::toolbox::random_shuffle (distance_matrix &dist, robin_hood::unordered_flat_map< size_t, size_t > &remaining_ids)
 Randomly swap entries in dist while keeping track of the changes of indices.
 
void seqan::hibf::sketch::toolbox::rearrange_bins (std::vector< hyperloglog > const &sketches, std::vector< size_t > const &counts, std::vector< size_t > &positions, double const max_ratio, size_t const num_threads)
 Rearrange filenames, sketches and counts such that similar bins are close to each other.
 
bool seqan::hibf::sketch::toolbox::rotate (std::vector< clustering_node > &clustering, size_t const previous_rightmost, size_t const first, size_t const id)
 Rotate the previous rightmost bin to the left of the clustering tree.
 
void seqan::hibf::sketch::toolbox::sort_by_cardinalities (std::vector< size_t > const &counts, std::vector< size_t > &positions)
 Sorts filenames and cardinalities by looking only at the cardinalities.
 
void seqan::hibf::sketch::toolbox::trace (std::vector< clustering_node > const &clustering, std::vector< size_t > &permutation, size_t const previous_rightmost, size_t const first, size_t const id)
 Do a recursive traceback to find the order of leaves in the clustering tree.
 

Detailed Description

Algorithms for union estimation and similarity rearrangement executed before computing the layout.

Function Documentation

◆ precompute_union_estimates_for()

void seqan::hibf::sketch::toolbox::precompute_union_estimates_for ( std::vector< uint64_t > & estimates,
std::vector< hyperloglog > const & sketches,
std::vector< size_t > const & counts,
std::vector< size_t > const & positions,
int64_t const j )

Estimate the cardinality of the union for a single user bin j with all prior ones j' < j.

Parameters
[out]estimatesoutput row
[in]sketchesThe hyperloglog sketches of the respective user bins.
[in]countsThe counts/sketch.estimates() of the respective user bins.
[in]positionsThe realtive positions of the input information to correctly access sketches and counts.
[in]jThe current user bin (column in the DP matrix)

estimates[j_prime] will be the union cardinality estimate of the interval {j_prime, ..., j}.

◆ precompute_initial_union_estimates()

void seqan::hibf::sketch::toolbox::precompute_initial_union_estimates ( std::vector< uint64_t > & estimates,
std::vector< hyperloglog > const & sketches,
std::vector< size_t > const & counts,
std::vector< size_t > const & positions )

Estimate the cardinality of the union for each interval [0, j] for all user bins j.

Parameters
[out]estimatesoutput row
[in]sketchesThe hyperloglog sketches of the respective user bins.
[in]countsThe counts/sketch.estimates() of the respective user bins.
[in]positionsThe realtive positions of the input information to correctly access sketches and counts.

estimates[j] will be the union cardinality estimate of the interval {0, ..., j}.

◆ rearrange_bins()

void seqan::hibf::sketch::toolbox::rearrange_bins ( std::vector< hyperloglog > const & sketches,
std::vector< size_t > const & counts,
std::vector< size_t > & positions,
double const max_ratio,
size_t const num_threads )

Rearrange filenames, sketches and counts such that similar bins are close to each other.

Parameters
[in]sketchesThe hyperloglog sketches of the respective user bins.
[in]countsThe counts/sketch.estimates() of the respective user bins.
[in,out]positionsThe realtive positions of the input information to correctly access sketches and counts.
[in]max_ratiothe maximal cardinality ratio in the clustering intervals (must be <= 1 and >= 0)
[in]num_threadsthe number of threads to use

◆ cluster_bins()

void seqan::hibf::sketch::toolbox::cluster_bins ( std::vector< hyperloglog > const & sketches,
std::vector< size_t > & positions,
std::vector< size_t > & permutation,
size_t const first,
size_t const last,
size_t const num_threads )

Perform an agglomerative clustering variant on the index range [first:last)

Parameters
[in]sketchesThe hyperloglog sketches of the respective user bins.
[in,out]positionsThe realtive positions of the input information to correctly access sketches and counts.
[in,out]permutationappend the new order to this
[in]firstid of the first cluster of the interval
[in]lastid of the last cluster of the interval plus one
[in]num_threadsthe number of threads to use

◆ random_shuffle()

void seqan::hibf::sketch::toolbox::random_shuffle ( distance_matrix & dist,
robin_hood::unordered_flat_map< size_t, size_t > & remaining_ids )

Randomly swap entries in dist while keeping track of the changes of indices.

Parameters
[in]distthe distance matrix (vector of priority queues) to shuffle
[in]remaining_idsthe map with information about which ids remain at which index

◆ prune()

void seqan::hibf::sketch::toolbox::prune ( distance_matrix & dist,
robin_hood::unordered_flat_map< size_t, size_t > & remaining_ids )

Delete inactive entries out of dist and shrink to fit its size while keeping track of the changes of indices.

Parameters
[in]distthe distance matrix (vector of priority queues) to prune
[in]remaining_idsthe map with information about which ids remain at which index

◆ rotate()

bool seqan::hibf::sketch::toolbox::rotate ( std::vector< clustering_node > & clustering,
size_t const previous_rightmost,
size_t const first,
size_t const id )

Rotate the previous rightmost bin to the left of the clustering tree.

Parameters
[in,out]clusteringthe tree to do the rotation on
[in]previous_rightmostthe id of the node to be rotated to the left
[in]firstthe id of the first node in the interval to shift the index
[in]idthe id of the current node

If called with the root of the tree, this function recursively calls itself while rotating several subtrees until previous_rightmost is at the very left end of the whole clustering tree.

Returns
whether previous rightmost was in the subtree rooted at id

◆ trace()

void seqan::hibf::sketch::toolbox::trace ( std::vector< clustering_node > const & clustering,
std::vector< size_t > & permutation,
size_t const previous_rightmost,
size_t const first,
size_t const id )

Do a recursive traceback to find the order of leaves in the clustering tree.

Parameters
[in]clusteringthe tree to do the traceback on
[out]permutationappend the new order to this
[in]previous_rightmostthe id of the node on the left which should be ignored
[in]firstthe id of the first node in the interval to shift the index
[in]idthe id of the current node

This function traverses the tree in depth-first-search accessing the leaves from left to right. 'Left to right' refers to the order of nodes in clustering.