SeqAn3 3.1.0
The Modern C++ library for sequence analysis.
Views

IO related views. More...

+ Collaboration diagram for Views:

Variables

constexpr auto seqan3::views::async_input_buffer
 A view adapter that returns a concurrent-queue-like view over the underlying range. More...
 

Detailed Description

IO related views.

See also
IO

Variable Documentation

◆ async_input_buffer

constexpr auto seqan3::views::async_input_buffer
inlineconstexpr

A view adapter that returns a concurrent-queue-like view over the underlying range.

Template Parameters
urng_tThe type of the range being processed. See below for requirements.
Parameters
[in,out]urangeThe range being processed.
[in]buffer_sizeSize of the buffer. Choose the size (> 0) depending on the expected work per element.
Returns
A view that pre-fetches elements from the underlying range and provides a thread-safe interface. See below for the properties of the returned range.

Header File

#include <seqan3/io/views/async_input_buffer.hpp>

Summary

This view spawns a background thread that pre-fetches elements from the underlying range and stores them in a concurrent queue. Iterating over this view then pops elements out of the queue and returns them. This is primarily useful if dereferencing/incrementing the iterator of the underlying range is expensive, e.g. with SeqAn files which lazily perform I/O.

Another advantage of this view is that multiple iterators can be created that are safe to iterate individually, even from different threads, i.e. you can use multiple threads to iterate safely over a single-pass input view with the added benefit of background pre-fetching.

In technical terms: this view facilitates a single-producer, multi-consumer design; it's a range interface over a concurrent queue.

Size of the buffer

The buffer_size parameter should be chosen depending on the expected work per element, e.g. if the underlying range is an input file over short reads, a buffer size of 100 or 1000 could be beneficial; if on the other hand the file contains genome-sized sequences, it would be better to buffer only a single sequence (buffering 100 sequences would result in the entire file being preloaded and likely consuming significant memory).

Range consumption

This view always moves elements from the underlying range into its buffer which means that the elements in the underlying range will be invalidated! For underlying ranges that are single-pass, this makes no difference, but it might be unexpected for multi-pass ranges (std::ranges::forward_range).

Typically this adaptor is used when you want to consume the entire underlying range. Destructing this view before all elements have been read will also stop the thread that moves object from the underlying range. In general, it is not safe to access the underlying range in other contexts once it has been passed to seqan3::views::async_input_buffer.

Note that in addition to the buffer of the view, every iterator has its own one-element-buffer. Dereferencing the iterator returns a reference to the element in the buffer, usually you will want to move this element out of the buffer with std::move std::ranges::iter_move. Incrementing the iterator refills the buffer from the queue inside the view (which in turn is then refilled from the underlying range).

View properties

concepts and reference type urng_t (underlying range type) rrng_t (returned range type)
std::ranges::input_range required preserved
std::ranges::forward_range lost
std::ranges::bidirectional_range lost
std::ranges::random_access_range lost
std::ranges::contiguous_range lost
std::ranges::viewable_range required guaranteed
std::ranges::view guaranteed
std::ranges::sized_range lost
std::ranges::common_range lost
std::ranges::output_range lost
seqan3::const_iterable_range lost
std::ranges::range_reference_t std::ranges::range_value_t<urng_t> &
std::iterator_traits ::iterator_category none

See the views submodule documentation for detailed descriptions of the view properties.

Thread safety

The following operations are thread-safe:

  • calling .begin() and .end() on the view returned by this adaptor;
  • calling operators on the different iterator objects.

Calling operators on the same iterator object from different threads is not safe, i.e. you can pass the view to different threads by reference, and have each of those threads call begin() on the view and then perform operations (dereference, increment...) on that iterator from the respective thread; but you cannot call begin() in a parent thread, pass the iterator to different threads and operate on that concurrently.

Example

#include <cstdlib> // std::rand
#include <future> // std::async
#include <string> // std::string
#include <seqan3/core/debug_stream.hpp> // seqan3::debug_stream
#include <seqan3/io/sequence_file/input.hpp> // seqan3::sequence_file_input
#include <seqan3/io/views/async_input_buffer.hpp> // seqan3::views::async_input_buffer
std::string fasta_file =
R"(> seq1
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq2
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq3
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq4
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq5
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq6
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq7
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq8
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq9
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq10
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq11
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
> seq12
ACGACTACGACGATCATCGATCGATCGATCGATCGATCGATCGATCGTACTACGATCGATCG
)";
int main()
{
// initialise random number generator, only needed for demonstration purposes
std::srand(std::time(nullptr));
// create an input file from the string above
// create the async buffer around the input file
// spawns a background thread that tries to keep four records in the buffer
// create a lambda function that iterates over the async buffer when called
// (the buffer gets dynamically refilled as soon as possible)
auto worker = [&v] ()
{
for (auto & record : v)
{
// pretend we are doing some work
// print current thread and sequence ID
<< "Seq: " << record.id() << '\n';
}
};
// launch two threads and pass the lambda function to both
auto f0 = std::async(std::launch::async, worker);
auto f1 = std::async(std::launch::async, worker);
}
Provides seqan3::views::async_input_buffer.
T async(T... args)
The FastA format.
Definition: format_fasta.hpp:80
A class for reading sequence files, e.g. FASTA, FASTQ ...
Definition: input.hpp:213
Provides seqan3::debug_stream and related types.
T get_id(T... args)
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition: debug_stream.hpp:37
constexpr auto async_input_buffer
A view adapter that returns a concurrent-queue-like view over the underlying range.
Definition: async_input_buffer.hpp:479
T rand(T... args)
Provides seqan3::sequence_file_input and corresponding traits classes.
T sleep_for(T... args)
T srand(T... args)
T time(T... args)

Running the snippet could yield the following output:

Thread: 0x80116bf00 Seq: seq2
Thread: 0x80116bf00 Seq: seq3
Thread: 0x80116ba00 Seq: seq1
Thread: 0x80116bf00 Seq: seq4
Thread: 0x80116bf00 Seq: seq6
Thread: 0x80116ba00 Seq: seq5
Thread: 0x80116bf00 Seq: seq7
Thread: 0x80116ba00 Seq: seq8
Thread: 0x80116bf00 Seq: seq9
Thread: 0x80116bf00 Seq: seq11
Thread: 0x80116bf00 Seq: seq12
Thread: 0x80116ba00 Seq: seq10

This shows that indeed elements from the underlying range are processed non-sequentially, that there are two threads and that work is "balanced" between them (one thread processed more element than the other, because its "work" per item happened to be smaller).

Note that you might encounter jumbled output if by chance two threads write to the stream at the exact same time.

If you remove the line starting with auto f1 = ... you will get sequential processing:

Thread: 0x80116aa00 Seq: seq1
Thread: 0x80116aa00 Seq: seq2
Thread: 0x80116aa00 Seq: seq3
Thread: 0x80116aa00 Seq: seq4
Thread: 0x80116aa00 Seq: seq5
Thread: 0x80116aa00 Seq: seq6
Thread: 0x80116aa00 Seq: seq7
Thread: 0x80116aa00 Seq: seq8
Thread: 0x80116aa00 Seq: seq9
Thread: 0x80116aa00 Seq: seq10
Thread: 0x80116aa00 Seq: seq11
Thread: 0x80116aa00 Seq: seq12

Note that even if you have a single processing thread, using this view can still improve performance measurably, because loading of the elements into the buffer (which reads input from disk) happens in a background thread.

This entity is experimental and subject to change in the future. Experimental since version 3.1.