HIBF 1.0.0-rc.1
All Classes Namespaces Files Functions Variables Typedefs Friends Macros Modules Pages Concepts
HIBF

The HIBF: seqan::hibf::hierarchical_interleaved_bloom_filter. More...

Topics

 Sketching
 Includes API to compute HyperLoglog sketches for the cardinality/size estimates for the layout algorithm.
 
 Layout
 Includes all API relevant to computing a hierarchical layout for building an HIBF.
 
 Build
 Includes all API relevant to build an HIBF from a given layout.
 

Classes

class  seqan::hibf::concurrent_timer
 A timer with a thread-safe operator+=(). More...
 
struct  seqan::hibf::config
 The configuration used to build an (H)IBF. More...
 
class  seqan::hibf::hierarchical_interleaved_bloom_filter
 The Hierarchical Interleaved Bloom Filter (HIBF) - Fast answers to set-membership queries for multiple bins. More...
 
class  seqan::hibf::serial_timer
 A timer. More...
 

Functions

template<std::unsigned_integral t1, std::unsigned_integral t2>
constexpr size_t seqan::hibf::divide_and_ceil (t1 const dividend, t2 const divisor) noexcept
 Returns, for unsigned integral operands, dividend / divisor ceiled to the next integer value.
 
constexpr size_t seqan::hibf::next_multiple_of_64 (size_t const value) noexcept
 Returns the smallest integer that is greater or equal to value and a multiple of 64. Returns 0 for value 0.
 
constexpr size_t seqan::hibf::subtract_empty_bins (size_t const tmax, double const fraction) noexcept
 Returns the number of technical bins available for use.
 

Detailed Description

The HIBF: seqan::hibf::hierarchical_interleaved_bloom_filter.

The Hierarchical Interleaved Bloom Filter is a data structure that provides fast answers to set-membership queries for multiple samples/set/user-bins.

The data structure is implemented in the seqan::hibf::hierarchical_interleaved_bloom_filter and improves its predecessor the seqan::hibf::interleaved_bloom_filter data structure by more efficient storage of unevenly sized samples/sets/user-bins and fast queries of thousands of samples.

Cite

Mehringer, Svenja, et al. "Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries." Genome Biology 24.1 (2023): 1-25.

Example

// SPDX-FileCopyrightText: 2006-2025, Knut Reinert & Freie Universität Berlin
// SPDX-FileCopyrightText: 2016-2025, Knut Reinert & MPI für molekulare Genetik
// SPDX-License-Identifier: CC0-1.0
#include <cstddef> // for size_t
#include <functional> // for function
#include <vector> // for vector
#include <hibf/config.hpp> // for config, insert_iterator
#include <hibf/hierarchical_interleaved_bloom_filter.hpp> // for hierarchical_interleaved_bloom_filter
#include <hibf/misc/print.hpp> // for print, print_t
int main()
{
// 2 user bins:
std::vector<std::vector<size_t>> hashes{{1u, 2u, 3u, 4u, 5u, 6u, 7u, 8u, 9u, 10u}, {1u, 2u, 3u, 4u, 5u}};
// input just passes hashes:
auto my_input = [&](size_t const user_bin_id, seqan::hibf::insert_iterator it)
{
for (auto const hash : hashes[user_bin_id])
it = hash;
};
seqan::hibf::config config{.input_fn = my_input, .number_of_user_bins = 2};
// construct the HIBF
// query the HIBF
std::vector<size_t> query{1u, 2u, 3u};
std::vector<size_t> query2{8u, 9u, 10u};
auto agent = hibf.membership_agent(); // you need an agent for efficient queries
auto & result = agent.membership_for(query, 2u); // both user bins have hashes 1,2,3
seqan::hibf::print(result); // [1,0]
agent.sort_results(); // Results can also be sorted
seqan::hibf::print(result); // [0,1]
auto & result2 = agent.membership_for(query2, 2u); // only user bin 0 has hashes 8,9,10
seqan::hibf::print(result2); // [0]
}
The Hierarchical Interleaved Bloom Filter (HIBF) - Fast answers to set-membership queries for multipl...
Definition hierarchical_interleaved_bloom_filter.hpp:140
Definition insert_iterator.hpp:25
The configuration used to build an (H)IBF.
Definition config.hpp:75
std::function< void(size_t const, insert_iterator &&) input_fn)
A function for how to hash your input [REQUIRED].
Definition config.hpp:110
See also
seqan::hibf::hierarchical_interleaved_bloom_filter
seqan::hibf::interleaved_bloom_filter
Official paper: https://doi.org/10.1186/s13059-023-02971-4

Function Documentation

◆ next_multiple_of_64()

constexpr size_t seqan::hibf::next_multiple_of_64 ( size_t const value)
constexprnoexcept

Returns the smallest integer that is greater or equal to value and a multiple of 64. Returns 0 for value 0.

Parameters
[in]valueThe Input value that is smaller or equal to the return value.

◆ subtract_empty_bins()

constexpr size_t seqan::hibf::subtract_empty_bins ( size_t const tmax,
double const fraction )
constexprnoexcept

Returns the number of technical bins available for use.

Parameters
[in]tmaxThe total number of bins.
[in]fractionThe fraction of the total number of bins that should be empty.
See also
https://godbolt.org/z/cMjbM39vj