A class for writing sequence files, e.g. FASTA, FASTQ ... More...
#include <seqan3/io/sequence_file/output.hpp>
Public Types | |
using | field_ids = fields< field::seq, field::id, field::qual, field::_seq_qual_deprecated > |
The subset of seqan3::field IDs that are valid for this file. | |
Template arguments | |
Exposed as member types for public access. | |
using | selected_field_ids = selected_field_ids_ |
A seqan3::fields list with the fields selected for the record. | |
using | valid_formats = valid_formats_ |
A seqan3::type_list with the possible formats. | |
using | stream_char_type = char |
Character type of the stream(s). | |
Range associated types | |
Most of the range associated types are | |
using | value_type = void |
The value type (void). | |
using | reference = void |
The reference type (void). | |
using | const_reference = void |
The const reference type (void). | |
using | size_type = void |
The size type (void). | |
using | difference_type = std::ptrdiff_t |
A signed integer type, usually std::ptrdiff_t. | |
using | iterator = detail::out_file_iterator< sequence_file_output > |
The iterator type of this view (an output iterator). | |
using | const_iterator = void |
The const iterator type is void, because files are not const-iterable. | |
using | sentinel = std::default_sentinel_t |
The type returned by end(). | |
Public Member Functions | |
Constructors, destructor and assignment | |
sequence_file_output ()=delete | |
Default constructor is explicitly deleted, you need to give a stream or file name. | |
sequence_file_output (sequence_file_output const &)=delete | |
Copy construction is explicitly deleted, because you can't have multiple access to the same file. | |
sequence_file_output & | operator= (sequence_file_output const &)=delete |
Copy assignment is explicitly deleted, because you can't have multiple access to the same file. | |
sequence_file_output (sequence_file_output &&)=default | |
Move construction is defaulted. | |
sequence_file_output & | operator= (sequence_file_output &&)=default |
Move assignment is defaulted. | |
~sequence_file_output ()=default | |
Destructor is defaulted. | |
sequence_file_output (std::filesystem::path filename, selected_field_ids const &fields_tag=selected_field_ids{}) | |
Construct from filename. More... | |
template<output_stream stream_t, sequence_file_output_format file_format> | |
sequence_file_output (stream_t &stream, file_format const &format_tag, selected_field_ids const &fields_tag=selected_field_ids{}) | |
Construct from an existing stream and with specified format. More... | |
template<output_stream stream_t, sequence_file_output_format file_format> | |
sequence_file_output (stream_t &&stream, file_format const &format_tag, selected_field_ids const &fields_tag=selected_field_ids{}) | |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. | |
Public Attributes | |
sequence_file_output_options | options {} |
The options are public and its members can be set directly. | |
Related Functions | |
(Note that these are not member functions.) | |
Type deduction guides | |
template<output_stream stream_t, sequence_file_output_format file_format> | |
sequence_file_output (stream_t &, file_format const &) -> sequence_file_output< typename sequence_file_output<>::selected_field_ids, type_list< file_format >> | |
Deduction guide for given stream and file format. | |
template<output_stream stream_t, sequence_file_output_format file_format> | |
sequence_file_output (stream_t &&, file_format const &) -> sequence_file_output< typename sequence_file_output<>::selected_field_ids, type_list< file_format >> | |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. | |
template<output_stream stream_t, sequence_file_output_format file_format, detail::fields_specialisation selected_field_ids> | |
sequence_file_output (stream_t &&, file_format const &, selected_field_ids const &) -> sequence_file_output< selected_field_ids, type_list< file_format >> | |
Deduction guide for given stream, file format and field ids. | |
template<output_stream stream_t, sequence_file_output_format file_format, detail::fields_specialisation selected_field_ids> | |
sequence_file_output (stream_t &, file_format const &, selected_field_ids const &) -> sequence_file_output< selected_field_ids, type_list< file_format >> | |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. | |
Range interface | |
iterator | begin () noexcept |
Returns an iterator to current position in the file. More... | |
sentinel | end () noexcept |
Returns a sentinel for comparison with iterator. More... | |
template<typename record_t > | |
void | push_back (record_t &&r) |
Write a seqan3::record to the file. More... | |
template<typename tuple_t > | |
void | push_back (tuple_t &&t) |
Write a record in form of a std::tuple to the file. More... | |
template<typename arg_t , typename ... arg_types> | |
void | emplace_back (arg_t &&arg, arg_types &&... args) |
Write a record to the file by passing individual fields. More... | |
template<std::ranges::input_range rng_t> | |
sequence_file_output & | operator= (rng_t &&range) |
Write a range of records (or tuples) to the file. More... | |
template<std::ranges::input_range rng_t> | |
sequence_file_output & | operator| (rng_t &&range, sequence_file_output &f) |
Write a range of records (or tuples) to the file. More... | |
template<std::ranges::input_range rng_t> | |
sequence_file_output | operator| (rng_t &&range, sequence_file_output &&f) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. | |
A class for writing sequence files, e.g. FASTA, FASTQ ...
selected_field_ids | A seqan3::fields type with the list and order of fields IDs; only relevant if these can't be deduced. |
valid_formats | A seqan3::type_list of the selectable formats (each must meet seqan3::sequence_file_output_format). |
Sequence files are the most generic and common biological files. Well-known formats include FastA and FastQ, but some may also be interested in treating SAM or BAM files as sequence files, discarding the alignment.
The Sequence file abstraction supports writing three different fields:
The member functions take any and either of these fields. If the field ID of an argument cannot be deduced, it is assumed to correspond to the field ID of the respective template parameter.
This class comes with two constructors, one for construction from a file name and one for construction from an existing stream and a known format. The first one automatically picks the format based on the extension of the file name. The second can be used if you have a non-file stream, like std::cout or std::ostringstream, that you want to read from and/or if you cannot use file-extension based detection, but know that your output file has a certain format.
In most cases the template parameters are deduced completely automatically:
Writing to std::cout:
Note that this is not the same as writing sequence_file_output<>
(with angle brackets). In the latter case they are explicitly set to their default values, in the former case automatic deduction happens which chooses different parameters depending on the constructor arguments. Prefer deduction over explicit defaults.
You can iterate over this file record-wise:
The easiest way to write to a sequence file is to use the push_back() or emplace_back() member functions. These work similarly to how they work on an std::vector. If you pass a tuple to push_back() or give arguments to emplace_back() the seqan3::field ID of the i-th tuple-element/argument is assumed to be the i-th value of selected_field_ids, i.e. by default the first is assumed to be seqan3::field::seq, the second seqan3::field::id and the third one seqan3::field::qual. You may give less fields than are selected if the actual format you are writing to can cope with less (e.g. for FastA it is sufficient to write seqan3::field::seq and seqan3::field::id, even if selected_field_ids also contains seqan3::field::qual at the third position).
You may also use the output file's iterator for writing, however, this rarely provides an advantage.
If you want to change the order of the parameters, you can pass a non-empty fields trait object to the sequence_file_output constructor to select the fields that are used for interpreting the arguments.
The following snippets demonstrates the usage of such a fields trait object.
A different way of passing custom fields to the file is to pass a seqan3::record – instead of a tuple – to push_back(). The seqan3::record clearly indicates which of its elements has which seqan3::field ID so the file will use that information instead of the template argument. This is especially handy when reading from one file and writing to another, because you don't have to configure the output file to match the input file, it will just work:
You can write multiple records at once, by assigning to the file:
Record-wise writing in batches also works for writing from input files directly to output files, because input files are also input ranges in SeqAn:
This can be combined with file-based views to create I/O pipelines:
The record-based interface treats the file as a range of tuples (the records), but in certain situations you might have the data as columns, i.e. a tuple-of-ranges, instead of range-of-tuples.
You can use column-based writing in that case, it uses operator=() and seqan3::views::zip():
We currently support writing the following formats:
|
no-apiinline |
Construct from filename.
[in] | filename | Path to the file you wish to open. |
[in] | fields_tag | A seqan3::fields tag. [optional] |
In addition to the file name, you may specify a custom seqan3::fields type which may be easier than defining all the template parameters.
This constructor transparently applies a compression stream on top of the file stream in case the given file extension suggests the user wants this. See the section on compression and decompression for more information.
|
no-apiinline |
Construct from an existing stream and with specified format.
file_format | The format of the file in the stream, must satisfy seqan3::sequence_file_output_format. |
[in,out] | stream | The stream to write to, must be derived of std::basic_ostream<stream_char_t>. |
[in] | format_tag | The file format tag. |
[in] | fields_tag | A seqan3::fields tag. [optional] |
This constructor does not apply compression transparently (because there is no way to know if the user wants this). However, you can just pass e.g. seqan3::contrib::gz_ostream to this constructor if you explicitly want compression. See the section on compression and decompression for more information.
|
no-apiinlinenoexcept |
Returns an iterator to current position in the file.
You can write to the file by assigning to the iterator, but using push_back() is usually more intuitive.
Constant.
No-throw guarantee.
|
no-apiinline |
Write a record to the file by passing individual fields.
arg_t | Type of the first field. |
arg_types | Types of further fields. |
[in] | arg | The first field to write. |
[in] | args | Further fields. |
The fields are assumed to correspond to the field IDs given in selected_field_ids, however passing less is accepted if the format does not require all of them.
Constant. TODO linear in the size of the written sequences?
Basic exception safety.
|
no-apiinlinenoexcept |
Returns a sentinel for comparison with iterator.
This element acts as a placeholder; attempting to dereference it results in undefined behaviour. It always compares false against an iterator.
Constant.
No-throw guarantee.
|
no-apiinline |
Write a range of records (or tuples) to the file.
rng_t | Type of the range, must satisfy std::ranges::output_range and have a reference type that satisfies seqan3::tuple_like. |
[in] | range | The range to write. |
This function simply iterates over the argument and calls push_back() on each element.
Linear in the number of records.
Basic exception safety.
|
no-apiinline |
Write a seqan3::record to the file.
record_t | Type of the record, a specialisation of seqan3::record. |
[in] | r | The record to write. |
Constant. TODO linear in the size of the written sequences?
Basic exception safety.
|
no-apiinline |
Write a record in form of a std::tuple to the file.
tuple_t | Type of the record, a specialisation of std::tuple. |
[in] | t | The record to write. |
The fields in the tuple are assumed to correspond to the field IDs given in selected_field_ids, however passing less is accepted if the format does not require all of them.
Constant. TODO linear in the size of the written sequences?
Basic exception safety.
|
no-apifriend |
Write a range of records (or tuples) to the file.
rng_t | Type of the range, must satisfy std::ranges::input_range and have a reference type that satisfies seqan3::tuple_like. |
[in] | range | The range to write. |
[in] | f | The file being written to. |
This operator enables sequence_file_output to be at the end of a piping operation. It just calls operator=() internally.
Linear in the number of records.
Basic exception safety.
This is especially useful in combination with file-based filters: