A class for writing structured sequence files, e.g. Stockholm, Connect, Vienna, ViennaRNA bpp matrix ... More...
#include <seqan3/io/structure_file/output.hpp>
Public Types | |
using | field_ids = fields< field::SEQ, field::ID, field::BPP, field::STRUCTURE, field::STRUCTURED_SEQ, field::ENERGY, field::REACT, field::REACT_ERR, field::COMMENT, field::OFFSET > |
The subset of seqan3::field IDs that are valid for this file. | |
Template arguments | |
Exposed as member types for public access. | |
using | selected_field_ids = selected_field_ids_ |
A seqan3::fields list with the fields selected for the record. | |
using | valid_formats = valid_formats_ |
A seqan3::type_list with the possible formats. | |
using | stream_char_type = stream_char_type_ |
Character type of the stream(s), usually char . | |
Range associated types | |
Most of the range associated types are | |
using | value_type = void |
The value type (void). | |
using | reference = void |
The reference type (void). | |
using | const_reference = void |
The const reference type (void). | |
using | size_type = void |
The size type (void). | |
using | difference_type = std::ptrdiff_t |
A signed integer type, usually std::ptrdiff_t. | |
using | iterator = detail::out_file_iterator< structure_file_output > |
The iterator type of this view (an output iterator). | |
using | const_iterator = void |
The const iterator type is void, because files are not const-iterable. | |
using | sentinel = std::ranges::default_sentinel_t |
The type returned by end(). | |
Public Member Functions | |
Constructors, destructor and assignment | |
structure_file_output ()=delete | |
Default constructor is explicitly deleted, you need to give a stream or file name. | |
structure_file_output (structure_file_output const &)=delete | |
Copy construction is explicitly deleted, because you can't have multiple access to the same file. | |
structure_file_output & | operator= (structure_file_output const &)=delete |
Copy assignment is explicitly deleted, because you can't have multiple access to the same file. | |
structure_file_output (structure_file_output &&)=default | |
Move construction is defaulted. | |
structure_file_output & | operator= (structure_file_output &&)=default |
Move assignment is defaulted. | |
~structure_file_output ()=default | |
Destructor is defaulted. | |
structure_file_output (std::filesystem::path filename, selected_field_ids const &fields_tag=selected_field_ids{}) | |
Construct from filename. More... | |
template<OStream2 stream_t, StructureFileOutputFormat file_format> | |
structure_file_output (stream_t &stream, file_format const &format_tag, selected_field_ids const &fields_tag=selected_field_ids{}) | |
Construct from an existing stream and with specified format. More... | |
template<OStream2 stream_t, StructureFileOutputFormat file_format> | |
structure_file_output (stream_t &&stream, file_format const &format_tag, selected_field_ids const &fields_tag=selected_field_ids{}) | |
Tuple interface | |
Provides functions for field-based ("column"-based) writing. | |
template<typename typelist , typename field_ids > | |
structure_file_output & | operator= (record< typelist, field_ids > const &r) |
Write columns (wrapped in a seqan3::record) to the file. More... | |
template<typename ... arg_types> | |
structure_file_output & | operator= (std::tuple< arg_types... > const &t) |
Write columns (wrapped in a std::tuple) to the file. More... | |
Public Attributes | |
structure_file_output_options | options |
The options are public and its members can be set directly. | |
Related Functions | |
(Note that these are not member functions.) | |
Type deduction guides | |
template<OStream2 stream_t, StructureFileOutputFormat file_format, detail::Fields selected_field_ids> | |
structure_file_output (stream_t &&, file_format const &, selected_field_ids const &) -> structure_file_output< selected_field_ids, type_list< file_format >, typename std::remove_reference_t< stream_t >::char_type > | |
Deduction of the selected fields, the file format and the stream type. | |
template<OStream2 stream_t, StructureFileOutputFormat file_format, detail::Fields selected_field_ids> | |
structure_file_output (stream_t &, file_format const &, selected_field_ids const &) -> structure_file_output< selected_field_ids, type_list< file_format >, typename std::remove_reference_t< stream_t >::char_type > | |
Range interface | |
iterator | begin () noexcept |
Returns an iterator to current position in the file. More... | |
sentinel | end () noexcept |
Returns a sentinel for comparison with iterator. More... | |
template<typename record_t > | |
void | push_back (record_t &&r) requires TupleLike< record_t > &&requires |
Write a seqan3::record to the file. More... | |
template<typename tuple_t > | |
void | push_back (tuple_t &&t) requires TupleLike< tuple_t > |
Write a record in form of a std::tuple to the file. More... | |
template<typename arg_t , typename ... arg_types> | |
void | emplace_back (arg_t &&arg, arg_types &&... args) |
Write a record to the file by passing individual fields. More... | |
template<std::ranges::InputRange rng_t> | |
structure_file_output & | operator= (rng_t &&range) requires TupleLike< reference_t< rng_t >> |
Write a range of records (or tuples) to the file. More... | |
template<std::ranges::InputRange rng_t> | |
structure_file_output & | operator| (rng_t &&range, structure_file_output &f) requires TupleLike< reference_t< rng_t >> |
Write a range of records (or tuples) to the file. More... | |
template<std::ranges::InputRange rng_t> | |
structure_file_output | operator| (rng_t &&range, structure_file_output &&f) requires TupleLike< reference_t< rng_t >> |
A class for writing structured sequence files, e.g. Stockholm, Connect, Vienna, ViennaRNA bpp matrix ...
selected_field_ids | A seqan3::fields type with the list and order of fields IDs; only relevant if these can't be deduced. |
valid_formats | A seqan3::type_list of the selectable formats (each must meet seqan3::StructureFileOutputFormat). |
stream_char_type | The type of the underlying stream device(s); must model seqan3::Char. |
Structured sequence files contain intra-molecular interactions of RNA or protein. Usually, but not necessarily, they contain the nucleotide or amino acid sequences and descriptions as well. Interactions can be represented either as fixed secondary structure, where every character is assigned at most one interaction partner (structure of minimum free energy), or an annotated sequence, where every character is assigned a set of interaction partners with specific base pair probabilities.
The structured sequence file abstraction supports writing ten different fields:
The member functions take any and either of these fields. If the field ID of an argument cannot be deduced, it is assumed to correspond to the field ID of the respective template parameter.
This class comes with two constructors, one for construction from a file name and one for construction from an existing stream and a known format. The first one automatically picks the format based on the extension of the file name. The second can be used if you have a non-file stream, like std::cout or std::ostringstream, that you want to read from and/or if you cannot use file-extension based detection, but know that your output file has a certain format.
In most cases the template parameters are deduced completely automatically:
Writing to std::cout:
Note that this is not the same as writing structure_file_output<>
(with angle brackets). In the latter case they are explicitly set to their default values, in the former case automatic deduction happens which chooses different parameters depending on the constructor arguments. Prefer deduction over explicit defaults.
You can iterate over this file record-wise:
The easiest way to write to a sequence file is to use the push_back() or emplace_back() member functions. These work similarly to how they work on an std::vector. If you pass a tuple to push_back() or give arguments to emplace_back() the seqan3::field ID of the i-th tuple-element/argument is assumed to be the i-th value of selected_field_ids, i.e. by default the first is assumed to be seqan3::field::SEQ, the second seqan3::field::ID and the third one seqan3::field::STRUCTURE. You may give less fields than are selected, if the actual format you are writing to can cope with less (e.g. for Vienna it is sufficient to write seqan3::field::SEQ, seqan3::field::ID and seqan3::field::STRUCTURE, even if selected_field_ids also contains seqan3::field::ENERGY).
You may also use the output file's iterator for writing, however, this rarely provides an advantage.
If you want to pass a combined object for SEQ and STRUCTURE fields to push_back() / emplace_back(), or if you want to change the order of the parameters, you can pass a non-empty fields trait object to the structure_file_output constructor to select the fields that are used for interpreting the arguments.
The following snippets demonstrates the usage of such a fields trait object.
A different way of passing custom fields to the file is to pass a seqan3::record – instead of a tuple – to push_back(). The seqan3::record clearly indicates which of its elements has which seqan3::field ID so the file will use that information instead of the template argument. This is especially handy when reading from one file and writing to another, because you don't have to configure the output file to match the input file, it will just work:
You can write multiple records at once, by assigning to the file:
Record-wise writing in batches also works for writing from input files directly to output files, because input files are also input ranges in SeqAn:
This can be combined with file-based views to create I/O pipelines:
The record-based interface treats the file as a range of tuples (the records), but in certain situations you might have the data as columns, i.e. a tuple-of-ranges, instead of range-of-tuples.
You can use column-based writing in that case, it uses operator=() :
Currently, the only implemented format is seqan3::format_vienna. More formats will follow soon.
|
inline |
Construct from filename.
[in] | filename | Path to the file you wish to open. |
[in] | fields_tag | A seqan3::fields tag. [optional] |
In addition to the file name, you may specify a custom seqan3::fields type which may be easier than defining all the template parameters.
This constructor transparently applies a compression stream on top of the file stream in case the given file extension suggests the user wants this. See the section on compression and decompression for more information.
|
inline |
Construct from an existing stream and with specified format.
file_format | The format of the file in the stream, must satisfy seqan3::StructureFileOutputFormat. |
[in,out] | stream | The stream to write to, must be derived of std::basic_ostream. |
[in] | format_tag | The file format tag. |
[in] | fields_tag | A seqan3::fields tag. [optional] |
This constructor does not apply compression transparently (because there is no way to know if the user wants this). However, you can just pass e.g. seqan3::contrib::gz_ostream to this constructor if you explicitly want compression. See the section on compression and decompression for more information.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
|
inlinenoexcept |
Returns an iterator to current position in the file.
You can write to the file by assigning to the iterator, but using push_back() is usually more intuitive.
Constant.
No-throw guarantee.
|
inline |
Write a record to the file by passing individual fields.
arg_t | Type of the first field. |
arg_types | Types of further fields. |
[in] | arg | The first field to write. |
[in] | args | Further fields. |
The fields are assumed to correspond to the field IDs given in selected_field_ids, however passing less is accepted if the format does not require all of them.
Constant. TODO linear in the size of the written sequences?
Basic exception safety.
|
inlinenoexcept |
Returns a sentinel for comparison with iterator.
This element acts as a placeholder; attempting to dereference it results in undefined behaviour. It always compares false against an iterator.
Constant.
No-throw guarantee.
|
inline |
Write a range of records (or tuples) to the file.
rng_t | Type of the range, must satisfy std::ranges::OutputRange and have a reference type that satisfies seqan3::TupleLike. |
[in] | range | The range to write. |
This function simply iterates over the argument and calls push_back() on each element.
Linear in the number of records.
Basic exception safety.
|
inline |
Write columns (wrapped in a seqan3::record) to the file.
typelist | Template argument to seqan3::record, each type must be a column (range-of-range). |
field_ids | Template argument to seqan3::record, the IDs corresponding to the columns. |
[in] | r | The record of columns. |
Linear in the size of the columns.
Basic exception safety.
|
inline |
Write columns (wrapped in a std::tuple) to the file.
arg_types | The column types, each type must be a range-of-range. |
[in] | t | The tuple of columns. |
The columns are assumed to correspond to the field IDs given in selected_field_ids, however passing less is accepted if the format does not require all of them.
Linear in the size of the columns.
Basic exception safety.
|
inline |
Write a seqan3::record to the file.
record_t | Type of the record, a specialisation of seqan3::record. |
[in] | r | The record to write. |
Constant. TODO linear in the size of the written sequences?
Basic exception safety.
|
inline |
Write a record in form of a std::tuple to the file.
tuple_t | Type of the record, a specialisation of std::tuple. |
[in] | t | The record to write. |
The fields in the tuple are assumed to correspond to the field IDs given in selected_field_ids, however passing less is accepted if the format does not require all of them.
Constant. TODO linear in the size of the written sequences?
Basic exception safety.
|
friend |
Write a range of records (or tuples) to the file.
rng_t | Type of the range, must satisfy std::ranges::InputRange and have a reference type that satisfies seqan3::TupleLike. |
[in] | range | The range to write. |
[in] | f | The file being written to. |
This operator enables structure_file_output to be at the end of a piping operation. It just calls operator=() internally.
Linear in the number of records.
Basic exception safety.
This is especially useful in combination with file-based filters:
|
friend |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
|
related |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.