HIBF 1.0.0-rc.1
|
Algorithms for union estimation and similarity rearrangement executed before computing the layout. More...
Classes | |
struct | seqan::hibf::sketch::toolbox::clustering_node |
type for a node in the clustering tree when for the rearrangement More... | |
struct | seqan::hibf::sketch::toolbox::entry |
entry of the distance matrix that has the id of a cluster with its neighbors in a prio queue More... | |
struct | seqan::hibf::sketch::toolbox::neighbor |
element of the second priority queue layer of the distance matrix More... | |
Typedefs | |
using | seqan::hibf::sketch::toolbox::distance_matrix = std::vector<entry> |
type of the distance matrix for the clustering for the rearrangement | |
using | seqan::hibf::sketch::toolbox::prio_queue = std::priority_queue<neighbor, std::vector<neighbor>, std::greater<neighbor>> |
type of a min heap based priority queue | |
Functions | |
void | seqan::hibf::sketch::toolbox::cluster_bins (std::vector< hyperloglog > const &sketches, std::vector< size_t > &positions, std::vector< size_t > &permutation, size_t const first, size_t const last, size_t const num_threads) |
Perform an agglomerative clustering variant on the index range [first:last) | |
void | seqan::hibf::sketch::toolbox::precompute_initial_union_estimates (std::vector< uint64_t > &estimates, std::vector< hyperloglog > const &sketches, std::vector< size_t > const &counts, std::vector< size_t > const &positions) |
Estimate the cardinality of the union for each interval [0, j] for all user bins j. | |
void | seqan::hibf::sketch::toolbox::precompute_union_estimates_for (std::vector< uint64_t > &estimates, std::vector< hyperloglog > const &sketches, std::vector< size_t > const &counts, std::vector< size_t > const &positions, int64_t const j) |
Estimate the cardinality of the union for a single user bin j with all prior ones j' < j. | |
void | seqan::hibf::sketch::toolbox::prune (distance_matrix &dist, robin_hood::unordered_flat_map< size_t, size_t > &remaining_ids) |
Delete inactive entries out of dist and shrink to fit its size while keeping track of the changes of indices. | |
void | seqan::hibf::sketch::toolbox::random_shuffle (distance_matrix &dist, robin_hood::unordered_flat_map< size_t, size_t > &remaining_ids) |
Randomly swap entries in dist while keeping track of the changes of indices. | |
void | seqan::hibf::sketch::toolbox::rearrange_bins (std::vector< hyperloglog > const &sketches, std::vector< size_t > const &counts, std::vector< size_t > &positions, double const max_ratio, size_t const num_threads) |
Rearrange filenames, sketches and counts such that similar bins are close to each other. | |
bool | seqan::hibf::sketch::toolbox::rotate (std::vector< clustering_node > &clustering, size_t const previous_rightmost, size_t const first, size_t const id) |
Rotate the previous rightmost bin to the left of the clustering tree. | |
void | seqan::hibf::sketch::toolbox::sort_by_cardinalities (std::vector< size_t > const &counts, std::vector< size_t > &positions) |
Sorts filenames and cardinalities by looking only at the cardinalities. | |
void | seqan::hibf::sketch::toolbox::trace (std::vector< clustering_node > const &clustering, std::vector< size_t > &permutation, size_t const previous_rightmost, size_t const first, size_t const id) |
Do a recursive traceback to find the order of leaves in the clustering tree. | |
Algorithms for union estimation and similarity rearrangement executed before computing the layout.
void seqan::hibf::sketch::toolbox::precompute_union_estimates_for | ( | std::vector< uint64_t > & | estimates, |
std::vector< hyperloglog > const & | sketches, | ||
std::vector< size_t > const & | counts, | ||
std::vector< size_t > const & | positions, | ||
int64_t const | j ) |
Estimate the cardinality of the union for a single user bin j with all prior ones j' < j.
[out] | estimates | output row |
[in] | sketches | The hyperloglog sketches of the respective user bins. |
[in] | counts | The counts/sketch.estimates() of the respective user bins. |
[in] | positions | The realtive positions of the input information to correctly access sketches and counts. |
[in] | j | The current user bin (column in the DP matrix) |
estimates[j_prime] will be the union cardinality estimate of the interval {j_prime, ..., j}.
void seqan::hibf::sketch::toolbox::precompute_initial_union_estimates | ( | std::vector< uint64_t > & | estimates, |
std::vector< hyperloglog > const & | sketches, | ||
std::vector< size_t > const & | counts, | ||
std::vector< size_t > const & | positions ) |
Estimate the cardinality of the union for each interval [0, j] for all user bins j.
[out] | estimates | output row |
[in] | sketches | The hyperloglog sketches of the respective user bins. |
[in] | counts | The counts/sketch.estimates() of the respective user bins. |
[in] | positions | The realtive positions of the input information to correctly access sketches and counts. |
estimates[j] will be the union cardinality estimate of the interval {0, ..., j}.
void seqan::hibf::sketch::toolbox::rearrange_bins | ( | std::vector< hyperloglog > const & | sketches, |
std::vector< size_t > const & | counts, | ||
std::vector< size_t > & | positions, | ||
double const | max_ratio, | ||
size_t const | num_threads ) |
Rearrange filenames, sketches and counts such that similar bins are close to each other.
[in] | sketches | The hyperloglog sketches of the respective user bins. |
[in] | counts | The counts/sketch.estimates() of the respective user bins. |
[in,out] | positions | The realtive positions of the input information to correctly access sketches and counts. |
[in] | max_ratio | the maximal cardinality ratio in the clustering intervals (must be <= 1 and >= 0) |
[in] | num_threads | the number of threads to use |
void seqan::hibf::sketch::toolbox::cluster_bins | ( | std::vector< hyperloglog > const & | sketches, |
std::vector< size_t > & | positions, | ||
std::vector< size_t > & | permutation, | ||
size_t const | first, | ||
size_t const | last, | ||
size_t const | num_threads ) |
Perform an agglomerative clustering variant on the index range [first:last)
[in] | sketches | The hyperloglog sketches of the respective user bins. |
[in,out] | positions | The realtive positions of the input information to correctly access sketches and counts. |
[in,out] | permutation | append the new order to this |
[in] | first | id of the first cluster of the interval |
[in] | last | id of the last cluster of the interval plus one |
[in] | num_threads | the number of threads to use |
void seqan::hibf::sketch::toolbox::random_shuffle | ( | distance_matrix & | dist, |
robin_hood::unordered_flat_map< size_t, size_t > & | remaining_ids ) |
Randomly swap entries in dist while keeping track of the changes of indices.
[in] | dist | the distance matrix (vector of priority queues) to shuffle |
[in] | remaining_ids | the map with information about which ids remain at which index |
void seqan::hibf::sketch::toolbox::prune | ( | distance_matrix & | dist, |
robin_hood::unordered_flat_map< size_t, size_t > & | remaining_ids ) |
Delete inactive entries out of dist and shrink to fit its size while keeping track of the changes of indices.
[in] | dist | the distance matrix (vector of priority queues) to prune |
[in] | remaining_ids | the map with information about which ids remain at which index |
bool seqan::hibf::sketch::toolbox::rotate | ( | std::vector< clustering_node > & | clustering, |
size_t const | previous_rightmost, | ||
size_t const | first, | ||
size_t const | id ) |
Rotate the previous rightmost bin to the left of the clustering tree.
[in,out] | clustering | the tree to do the rotation on |
[in] | previous_rightmost | the id of the node to be rotated to the left |
[in] | first | the id of the first node in the interval to shift the index |
[in] | id | the id of the current node |
If called with the root of the tree, this function recursively calls itself while rotating several subtrees until previous_rightmost is at the very left end of the whole clustering tree.
void seqan::hibf::sketch::toolbox::trace | ( | std::vector< clustering_node > const & | clustering, |
std::vector< size_t > & | permutation, | ||
size_t const | previous_rightmost, | ||
size_t const | first, | ||
size_t const | id ) |
Do a recursive traceback to find the order of leaves in the clustering tree.
[in] | clustering | the tree to do the traceback on |
[out] | permutation | append the new order to this |
[in] | previous_rightmost | the id of the node on the left which should be ignored |
[in] | first | the id of the first node in the interval to shift the index |
[in] | id | the id of the current node |
This function traverses the tree in depth-first-search accessing the leaves from left to right. 'Left to right' refers to the order of nodes in clustering
.