SeqAn3  3.0.3
The Modern C++ library for sequence analysis.
seqan3::sequence_file_output< selected_field_ids_, valid_formats_ > Class Template Reference

A class for writing sequence files, e.g. FASTA, FASTQ ... More...

#include <seqan3/io/sequence_file/output.hpp>

Public Types

using field_ids = fields< field::seq, field::id, field::qual, field::_seq_qual_deprecated >
 The subset of seqan3::field IDs that are valid for this file.
 
Template arguments

Exposed as member types for public access.

using selected_field_ids = selected_field_ids_
 A seqan3::fields list with the fields selected for the record.
 
using valid_formats = valid_formats_
 A seqan3::type_list with the possible formats.
 
using stream_char_type = char
 Character type of the stream(s).
 
Range associated types

Most of the range associated types are void for output ranges.

using value_type = void
 The value type (void).
 
using reference = void
 The reference type (void).
 
using const_reference = void
 The const reference type (void).
 
using size_type = void
 The size type (void).
 
using difference_type = std::ptrdiff_t
 A signed integer type, usually std::ptrdiff_t.
 
using iterator = detail::out_file_iterator< sequence_file_output >
 The iterator type of this view (an output iterator).
 
using const_iterator = void
 The const iterator type is void, because files are not const-iterable.
 
using sentinel = std::default_sentinel_t
 The type returned by end().
 

Public Member Functions

Constructors, destructor and assignment
 sequence_file_output ()=delete
 Default constructor is explicitly deleted, you need to give a stream or file name.
 
 sequence_file_output (sequence_file_output const &)=delete
 Copy construction is explicitly deleted, because you can't have multiple access to the same file.
 
sequence_file_outputoperator= (sequence_file_output const &)=delete
 Copy assignment is explicitly deleted, because you can't have multiple access to the same file.
 
 sequence_file_output (sequence_file_output &&)=default
 Move construction is defaulted.
 
sequence_file_outputoperator= (sequence_file_output &&)=default
 Move assignment is defaulted.
 
 ~sequence_file_output ()=default
 Destructor is defaulted.
 
 sequence_file_output (std::filesystem::path filename, selected_field_ids const &fields_tag=selected_field_ids{})
 Construct from filename. More...
 
template<output_stream stream_t, sequence_file_output_format file_format>
 sequence_file_output (stream_t &stream, file_format const &format_tag, selected_field_ids const &fields_tag=selected_field_ids{})
 Construct from an existing stream and with specified format. More...
 
template<output_stream stream_t, sequence_file_output_format file_format>
 sequence_file_output (stream_t &&stream, file_format const &format_tag, selected_field_ids const &fields_tag=selected_field_ids{})
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
 

Public Attributes

sequence_file_output_options options {}
 The options are public and its members can be set directly.
 

Related Functions

(Note that these are not member functions.)

Type deduction guides
template<output_stream stream_t, sequence_file_output_format file_format>
 sequence_file_output (stream_t &, file_format const &) -> sequence_file_output< typename sequence_file_output<>::selected_field_ids, type_list< file_format >>
 Deduction guide for given stream and file format.
 
template<output_stream stream_t, sequence_file_output_format file_format>
 sequence_file_output (stream_t &&, file_format const &) -> sequence_file_output< typename sequence_file_output<>::selected_field_ids, type_list< file_format >>
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
 
template<output_stream stream_t, sequence_file_output_format file_format, detail::fields_specialisation selected_field_ids>
 sequence_file_output (stream_t &&, file_format const &, selected_field_ids const &) -> sequence_file_output< selected_field_ids, type_list< file_format >>
 Deduction guide for given stream, file format and field ids.
 
template<output_stream stream_t, sequence_file_output_format file_format, detail::fields_specialisation selected_field_ids>
 sequence_file_output (stream_t &, file_format const &, selected_field_ids const &) -> sequence_file_output< selected_field_ids, type_list< file_format >>
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
 

Range interface

Provides functions for record based writing of the file.

iterator begin () noexcept
 Returns an iterator to current position in the file. More...
 
sentinel end () noexcept
 Returns a sentinel for comparison with iterator. More...
 
template<typename record_t >
void push_back (record_t &&r)
 Write a seqan3::record to the file. More...
 
template<typename tuple_t >
void push_back (tuple_t &&t)
 Write a record in form of a std::tuple to the file. More...
 
template<typename arg_t , typename ... arg_types>
void emplace_back (arg_t &&arg, arg_types &&... args)
 Write a record to the file by passing individual fields. More...
 
template<std::ranges::input_range rng_t>
sequence_file_outputoperator= (rng_t &&range)
 Write a range of records (or tuples) to the file. More...
 
template<std::ranges::input_range rng_t>
sequence_file_outputoperator| (rng_t &&range, sequence_file_output &f)
 Write a range of records (or tuples) to the file. More...
 
template<std::ranges::input_range rng_t>
sequence_file_output operator| (rng_t &&range, sequence_file_output &&f)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
 

Detailed Description

template<detail::fields_specialisation selected_field_ids_ = fields<field::seq, field::id, field::qual>, detail::type_list_of_sequence_file_output_formats valid_formats_ = type_list<format_embl, format_fasta, format_fastq, format_genbank, format_sam>>
class seqan3::sequence_file_output< selected_field_ids_, valid_formats_ >

A class for writing sequence files, e.g. FASTA, FASTQ ...

Template Parameters
selected_field_idsA seqan3::fields type with the list and order of fields IDs; only relevant if these can't be deduced.
valid_formatsA seqan3::type_list of the selectable formats (each must meet seqan3::sequence_file_output_format).

Introduction

Sequence files are the most generic and common biological files. Well-known formats include FastA and FastQ, but some may also be interested in treating SAM or BAM files as sequence files, discarding the alignment.

The Sequence file abstraction supports writing three different fields:

  1. seqan3::field::seq
  2. seqan3::field::id
  3. seqan3::field::qual

The member functions take any and either of these fields. If the field ID of an argument cannot be deduced, it is assumed to correspond to the field ID of the respective template parameter.

Construction and specialisation

This class comes with two constructors, one for construction from a file name and one for construction from an existing stream and a known format. The first one automatically picks the format based on the extension of the file name. The second can be used if you have a non-file stream, like std::cout or std::ostringstream, that you want to read from and/or if you cannot use file-extension based detection, but know that your output file has a certain format.

In most cases the template parameters are deduced completely automatically:

int main()
{
auto fasta_file = std::filesystem::current_path() / "my.fasta";
// FastA format detected, std::ofstream opened for file
}
A class for writing sequence files, e.g. FASTA, FASTQ ...
Definition: output.hpp:168
T current_path(T... args)
This header includes C++17 filesystem support and imports it into namespace std::filesystem (independ...
Provides seqan3::sequence_file_output and corresponding traits classes.

Writing to std::cout:

int main()
{
// ^ no need to specify the template arguments
}
The FastA format.
Definition: format_fasta.hpp:81

Note that this is not the same as writing sequence_file_output<> (with angle brackets). In the latter case they are explicitly set to their default values, in the former case automatic deduction happens which chooses different parameters depending on the constructor arguments. Prefer deduction over explicit defaults.

Writing record-wise

You can iterate over this file record-wise:

#include <sstream>
#include <string>
#include <tuple>
int main()
{
using namespace seqan3::literals;
for (int i = 0; i < 5; ++i) // ...
{
std::string id{"test_id"};
seqan3::dna5_vector seq{"ACGT"_dna5};
// ...
fout.emplace_back(seq, id); // as individual variables
// or:
fout.push_back(std::tie(seq, id)); // as a tuple
}
}
Provides seqan3::dna5, container aliases and string literals.
@ seq
The "sequence", usually a range of nucleotides or amino acids.
The SeqAn namespace for literals.
T tie(T... args)

The easiest way to write to a sequence file is to use the push_back() or emplace_back() member functions. These work similarly to how they work on an std::vector. If you pass a tuple to push_back() or give arguments to emplace_back() the seqan3::field ID of the i-th tuple-element/argument is assumed to be the i-th value of selected_field_ids, i.e. by default the first is assumed to be seqan3::field::seq, the second seqan3::field::id and the third one seqan3::field::qual. You may give less fields than are selected if the actual format you are writing to can cope with less (e.g. for FastA it is sufficient to write seqan3::field::seq and seqan3::field::id, even if selected_field_ids also contains seqan3::field::qual at the third position).

You may also use the output file's iterator for writing, however, this rarely provides an advantage.

Writing record-wise (custom fields)

If you want to change the order of the parameters, you can pass a non-empty fields trait object to the sequence_file_output constructor to select the fields that are used for interpreting the arguments.

The following snippets demonstrates the usage of such a fields trait object.

#include <sstream>
#include <string>
#include <tuple>
#include <vector>
int main()
{
using namespace seqan3::literals;
for (int i = 0; i < 5; i++)
{
std::string id{"test_id"};
// vector of combined data structure:
{'A'_dna5, '1'_phred42},
{'C'_dna5, '3'_phred42}};
auto view_on_seq = seqan3::views::elements<0>(seq_qual);
auto view_on_qual = seqan3::views::elements<1>(seq_qual);
// ...
// Note that the order of the arguments is different from the default `seq, id, qual`,
// because you specified that ID should be first in the fields template argument.
fout.emplace_back(id, view_on_seq, view_on_qual);
// or:
fout.push_back(std::tie(id, view_on_seq, view_on_qual));
}
}
The FastQ format.
Definition: format_fastq.hpp:79
Provides seqan3::views::elements.
@ seq_qual
[DEPRECATED] Sequence and qualities combined in one range. Use field::seq and field::qual instead.
Provides seqan3::phred42 quality scores.
Provides quality alphabet composites.
A class template that holds a choice of seqan3::field.
Definition: record.hpp:172

A different way of passing custom fields to the file is to pass a seqan3::record – instead of a tuple – to push_back(). The seqan3::record clearly indicates which of its elements has which seqan3::field ID so the file will use that information instead of the template argument. This is especially handy when reading from one file and writing to another, because you don't have to configure the output file to match the input file, it will just work:

#include <sstream>
auto input = R"(@TEST1
ACGT
+
##!#
@Test2
AGGCTGA
+
##!#!!!
@Test3
GGAGTATAATATATATATATATAT
+
##!###!###!###!###!###!#)";
int main()
{
seqan3::format_fastq{}}; // doesn't have to match the configuration
for (auto & r : fin)
{
if (true) // r fulfills some criterium
fout.push_back(r);
}
}
A class for reading sequence files, e.g. FASTA, FASTQ ...
Definition: input.hpp:309
Provides seqan3::sequence_file_input and corresponding traits classes.

Writing record-wise in batches

You can write multiple records at once, by assigning to the file:

#include <sstream>
#include <string>
#include <tuple>
#include <vector>
int main()
{
using namespace seqan3::literals;
{
{ "ACGT"_dna5, "First" },
{ "NATA"_dna5, "2nd" },
{ "GATA"_dna5, "Third" }
}; // a range of "records"
fout = range;
// the same as:
range | fout;
}

File I/O pipelines

Record-wise writing in batches also works for writing from input files directly to output files, because input files are also input ranges in SeqAn:

#include <sstream>
auto input = R"(@TEST1
ACGT
+
##!#
@Test2
AGGCTGA
+
##!#!!!
@Test3
GGAGTATAATATATATATATATAT
+
##!###!###!###!###!###!#)";
int main()
{
// file format conversion in one line:
// with seqan3::sequence_file_output as a variable:
fout = fin;
// or in pipe notation:
}

This can be combined with file-based views to create I/O pipelines:

#include <sstream>
auto input = R"(@TEST1
ACGT
+
##!#
@Test2
AGGCTGA
+
##!#!!!
@Test3
GGAGTATAATATATATATATATAT
+
##!###!###!###!###!###!#)";
int main()
{
// minimum_average_quality_filter and minimum_sequence_length_filter need to be implemented first
auto minimum_sequence_length_filter = std::views::filter([] (auto rec)
{
return std::ranges::distance(rec.sequence()) >= 50;
});
auto minimum_average_quality_filter = std::views::filter([] (auto const & record)
{
double qual_sum{0}; // summation of the qualities
for (auto chr : record.base_qualities())
qual_sum += seqan3::to_phred(chr);
// check if average quality is greater than 20.
return qual_sum / (std::ranges::distance(record.base_qualities())) >= 20;
});
input_file | minimum_average_quality_filter
| minimum_sequence_length_filter
}
constexpr auto to_phred
The public getter function for the Phred representation of a quality score.
Definition: concept.hpp:100
typename decltype(detail::split_after< i >(list_t{}))::first_type take
Return a seqan3::type_list of the first n types in the input type list.
Definition: traits.hpp:368
Provides C++20 additions to the <iterator> header.
Adaptations of concepts from the Ranges TS.

Column-based writing

The record-based interface treats the file as a range of tuples (the records), but in certain situations you might have the data as columns, i.e. a tuple-of-ranges, instead of range-of-tuples.

You can use column-based writing in that case, it uses operator=() and seqan3::views::zip():

#include <sstream>
#include <string>
using namespace seqan3::literals;
struct data_storage_t
{
seqan3::concatenated_sequences<seqan3::dna4_vector> sequences{"ACGT"_dna4, "AAA"_dna4};
};
int main()
{
data_storage_t data_storage{};
// ... in your file writing function:
fout = seqan3::views::zip(data_storage.sequences, data_storage.ids);
}
Provides seqan3::concatenated_sequences.
Container that stores sequences concatenated internally.
Definition: concatenated_sequences.hpp:190
Provides seqan3::dna4, container aliases and string literals.
constexpr auto zip
A zip view.
Definition: zip.hpp:29
Provides seqan3::views::zip.

Formats

We currently support writing the following formats:

Constructor & Destructor Documentation

◆ sequence_file_output() [1/2]

template<detail::fields_specialisation selected_field_ids_ = fields<field::seq, field::id, field::qual>, detail::type_list_of_sequence_file_output_formats valid_formats_ = type_list<format_embl, format_fasta, format_fastq, format_genbank, format_sam>>
seqan3::sequence_file_output< selected_field_ids_, valid_formats_ >::sequence_file_output ( std::filesystem::path  filename,
selected_field_ids const &  fields_tag = selected_field_ids{} 
)
inline

Construct from filename.

Parameters
[in]filenamePath to the file you wish to open.
[in]fields_tagA seqan3::fields tag. [optional]

In addition to the file name, you may specify a custom seqan3::fields type which may be easier than defining all the template parameters.

Compression

This constructor transparently applies a compression stream on top of the file stream in case the given file extension suggests the user wants this. See the section on compression and decompression for more information.

◆ sequence_file_output() [2/2]

template<detail::fields_specialisation selected_field_ids_ = fields<field::seq, field::id, field::qual>, detail::type_list_of_sequence_file_output_formats valid_formats_ = type_list<format_embl, format_fasta, format_fastq, format_genbank, format_sam>>
template<output_stream stream_t, sequence_file_output_format file_format>
seqan3::sequence_file_output< selected_field_ids_, valid_formats_ >::sequence_file_output ( stream_t &  stream,
file_format const &  format_tag,
selected_field_ids const &  fields_tag = selected_field_ids{} 
)
inline

Construct from an existing stream and with specified format.

Template Parameters
file_formatThe format of the file in the stream, must satisfy seqan3::sequence_file_output_format.
Parameters
[in,out]streamThe stream to write to, must be derived of std::basic_ostream<stream_char_t>.
[in]format_tagThe file format tag.
[in]fields_tagA seqan3::fields tag. [optional]

Compression

This constructor does not apply compression transparently (because there is no way to know if the user wants this). However, you can just pass e.g. seqan3::contrib::gz_ostream to this constructor if you explicitly want compression. See the section on compression and decompression for more information.

Member Function Documentation

◆ begin()

template<detail::fields_specialisation selected_field_ids_ = fields<field::seq, field::id, field::qual>, detail::type_list_of_sequence_file_output_formats valid_formats_ = type_list<format_embl, format_fasta, format_fastq, format_genbank, format_sam>>
iterator seqan3::sequence_file_output< selected_field_ids_, valid_formats_ >::begin ( )
inlinenoexcept

Returns an iterator to current position in the file.

Returns
An iterator pointing to the current position in the file.

You can write to the file by assigning to the iterator, but using push_back() is usually more intuitive.

Complexity

Constant.

Exceptions

No-throw guarantee.

Example

#include <sstream>
#include <string>
#include <tuple>
int main()
{
using namespace seqan3::literals;
auto it = fout.begin();
for(int i = 0; i < 5; ++i) // some criteria
{
std::string id{"test_id"};
seqan3::dna5_vector seq{"ACGT"_dna5};
// ...
// assign to iterator
*it = std::tie(seq, id);
// is the same as:
fout.push_back(std::tie(seq, id));
}
}

◆ emplace_back()

template<detail::fields_specialisation selected_field_ids_ = fields<field::seq, field::id, field::qual>, detail::type_list_of_sequence_file_output_formats valid_formats_ = type_list<format_embl, format_fasta, format_fastq, format_genbank, format_sam>>
template<typename arg_t , typename ... arg_types>
void seqan3::sequence_file_output< selected_field_ids_, valid_formats_ >::emplace_back ( arg_t &&  arg,
arg_types &&...  args 
)
inline

Write a record to the file by passing individual fields.

Template Parameters
arg_tType of the first field.
arg_typesTypes of further fields.
Parameters
[in]argThe first field to write.
[in]argsFurther fields.

The fields are assumed to correspond to the field IDs given in selected_field_ids, however passing less is accepted if the format does not require all of them.

Complexity

Constant. TODO linear in the size of the written sequences?

Exceptions

Basic exception safety.

Example

#include <sstream>
#include <string>
int main()
{
using namespace seqan3::literals;
for(int i = 0; i < 5; ++i) // some criteria
{
std::string id{"test_id"};
seqan3::dna5_vector seq{"ACGT"_dna5};
// ...
fout.emplace_back(seq, id);
}
}

◆ end()

template<detail::fields_specialisation selected_field_ids_ = fields<field::seq, field::id, field::qual>, detail::type_list_of_sequence_file_output_formats valid_formats_ = type_list<format_embl, format_fasta, format_fastq, format_genbank, format_sam>>
sentinel seqan3::sequence_file_output< selected_field_ids_, valid_formats_ >::end ( )
inlinenoexcept

Returns a sentinel for comparison with iterator.

Returns
An end that is never reached.

This element acts as a placeholder; attempting to dereference it results in undefined behaviour. It always compares false against an iterator.

Complexity

Constant.

Exceptions

No-throw guarantee.

◆ operator=()

template<detail::fields_specialisation selected_field_ids_ = fields<field::seq, field::id, field::qual>, detail::type_list_of_sequence_file_output_formats valid_formats_ = type_list<format_embl, format_fasta, format_fastq, format_genbank, format_sam>>
template<std::ranges::input_range rng_t>
sequence_file_output& seqan3::sequence_file_output< selected_field_ids_, valid_formats_ >::operator= ( rng_t &&  range)
inline

Write a range of records (or tuples) to the file.

Template Parameters
rng_tType of the range, must satisfy std::ranges::output_range and have a reference type that satisfies seqan3::tuple_like.
Parameters
[in]rangeThe range to write.

This function simply iterates over the argument and calls push_back() on each element.

Complexity

Linear in the number of records.

Exceptions

Basic exception safety.

Example

#include <sstream>
#include <string>
#include <tuple>
#include <vector>
int main()
{
using namespace seqan3::literals;
{
{ "ACGT"_dna5, "First" },
{ "NATA"_dna5, "2nd" },
{ "GATA"_dna5, "Third" }
}; // a range of "records"
fout = range;
// the same as:
range | fout;
}

◆ push_back() [1/2]

template<detail::fields_specialisation selected_field_ids_ = fields<field::seq, field::id, field::qual>, detail::type_list_of_sequence_file_output_formats valid_formats_ = type_list<format_embl, format_fasta, format_fastq, format_genbank, format_sam>>
template<typename record_t >
void seqan3::sequence_file_output< selected_field_ids_, valid_formats_ >::push_back ( record_t &&  r)
inline

Write a seqan3::record to the file.

Template Parameters
record_tType of the record, a specialisation of seqan3::record.
Parameters
[in]rThe record to write.

Complexity

Constant. TODO linear in the size of the written sequences?

Exceptions

Basic exception safety.

Example

#include <sstream>
#include <string>
int main()
{
using namespace seqan3::literals;
for(int i = 0; i < 5; ++i) // some criteria
{
// ...
fout.push_back(r);
}
}
The class template that file records are based on; behaves like an std::tuple.
Definition: record.hpp:235
Provides seqan3::type_list.

◆ push_back() [2/2]

template<detail::fields_specialisation selected_field_ids_ = fields<field::seq, field::id, field::qual>, detail::type_list_of_sequence_file_output_formats valid_formats_ = type_list<format_embl, format_fasta, format_fastq, format_genbank, format_sam>>
template<typename tuple_t >
void seqan3::sequence_file_output< selected_field_ids_, valid_formats_ >::push_back ( tuple_t &&  t)
inline

Write a record in form of a std::tuple to the file.

Template Parameters
tuple_tType of the record, a specialisation of std::tuple.
Parameters
[in]tThe record to write.

The fields in the tuple are assumed to correspond to the field IDs given in selected_field_ids, however passing less is accepted if the format does not require all of them.

Complexity

Constant. TODO linear in the size of the written sequences?

Exceptions

Basic exception safety.

Example

#include <sstream>
#include <string>
#include <tuple>
int main()
{
using namespace seqan3::literals;
for(int i = 0; i < 5; ++i) // some criteria
{
std::string id{"test_id"};
seqan3::dna5_vector seq{"ACGT"_dna5};
// ...
fout.push_back(std::tie(seq, id));
}
}

Friends And Related Function Documentation

◆ operator|

template<detail::fields_specialisation selected_field_ids_ = fields<field::seq, field::id, field::qual>, detail::type_list_of_sequence_file_output_formats valid_formats_ = type_list<format_embl, format_fasta, format_fastq, format_genbank, format_sam>>
template<std::ranges::input_range rng_t>
sequence_file_output& operator| ( rng_t &&  range,
sequence_file_output< selected_field_ids_, valid_formats_ > &  f 
)
friend

Write a range of records (or tuples) to the file.

Template Parameters
rng_tType of the range, must satisfy std::ranges::input_range and have a reference type that satisfies seqan3::tuple_like.
Parameters
[in]rangeThe range to write.
[in]fThe file being written to.

This operator enables sequence_file_output to be at the end of a piping operation. It just calls operator=() internally.

Complexity

Linear in the number of records.

Exceptions

Basic exception safety.

Example

#include <sstream>
#include <string>
#include <tuple>
#include <vector>
int main()
{
using namespace seqan3::literals;
{
{ "ACGT"_dna5, "First" },
{ "NATA"_dna5, "2nd" },
{ "GATA"_dna5, "Third" }
}; // a range of "records"
fout = range;
// the same as:
range | fout;
}

This is especially useful in combination with file-based filters:

#include <sstream>
auto input = R"(@TEST1
ACGT
+
##!#
@Test2
AGGCTGA
+
##!#!!!
@Test3
GGAGTATAATATATATATATATAT
+
##!###!###!###!###!###!#)";
int main()
{
// minimum_average_quality_filter and minimum_sequence_length_filter need to be implemented first
auto minimum_sequence_length_filter = std::views::filter([] (auto rec)
{
return std::ranges::distance(rec.sequence()) >= 50;
});
auto minimum_average_quality_filter = std::views::filter([] (auto const & record)
{
double qual_sum{0}; // summation of the qualities
for (auto chr : record.base_qualities())
qual_sum += seqan3::to_phred(chr);
// check if average quality is greater than 20.
return qual_sum / (std::ranges::distance(record.base_qualities())) >= 20;
});
input_file | minimum_average_quality_filter
| minimum_sequence_length_filter
}

The documentation for this class was generated from the following file: