The SAM format (tag). More...

#include <seqan3/io/alignment_file/format_sam.hpp>

Inheritance diagram for seqan3::format_sam:

Public Member Functions
Constructors, destructor and assignment
	format_sam ()=default
	Defaulted.

	format_sam (format_sam const &)=default
	Defaulted.

format_sam &	operator= (format_sam const &)=default
	Defaulted.

	format_sam (format_sam &&)=default
	Defaulted.

format_sam &	operator= (format_sam &&)=default
	Defaulted.

	~format_sam ()=default

Static Public Attributes
static std::vector< std::string >	file_extensions
	The valid file extensions for this format; note that you can modify this value. More...

Protected Member Functions
template<typename stream_type , typename seq_legal_alph_type , typename ref_seqs_type , typename ref_ids_type , typename seq_type , typename id_type , typename offset_type , typename ref_seq_type , typename ref_id_type , typename ref_offset_type , typename align_type , typename cigar_type , typename flag_type , typename mapq_type , typename qual_type , typename mate_type , typename tag_dict_type , typename e_value_type , typename bit_score_type >
void	read_alignment_record (stream_type &stream, alignment_file_input_options< seq_legal_alph_type > const &options, ref_seqs_type &ref_seqs, alignment_file_header< ref_ids_type > &header, seq_type &seq, qual_type &qual, id_type &id, offset_type &offset, ref_seq_type &ref_seq, ref_id_type &ref_id, ref_offset_type &ref_offset, align_type &align, cigar_type &cigar_vector, flag_type &flag, mapq_type &mapq, mate_type &mate, tag_dict_type &tag_dict, e_value_type &e_value, bit_score_type &bit_score)
	Read from the specified stream and back-insert into the given field buffers. More...

template<typename stream_type , typename seq_legal_alph_type , bool seq_qual_combined, typename seq_type , typename id_type , typename qual_type >
void	read_sequence_record (stream_type &stream, sequence_file_input_options< seq_legal_alph_type, seq_qual_combined > const &options, seq_type &sequence, id_type &id, qual_type &qualities)
	Read from the specified stream and back-insert into the given field buffers. More...

template<typename stream_type , typename header_type , typename seq_type , typename id_type , typename ref_seq_type , typename ref_id_type , typename align_type , typename qual_type , typename mate_type , typename tag_dict_type , typename e_value_type , typename bit_score_type >
void	write_alignment_record (stream_type &stream, alignment_file_output_options const &options, header_type &&header, seq_type &&seq, qual_type &&qual, id_type &&id, int32_t const offset, ref_seq_type &&ref_seq, ref_id_type &&ref_id, std::optional< int32_t > ref_offset, align_type &&align, std::vector< cigar > const &cigar_vector, sam_flag const flag, uint8_t const mapq, mate_type &&mate, tag_dict_type &&tag_dict, e_value_type &&e_value, bit_score_type &&bit_score)
	Write the given fields to the specified stream. More...

template<typename stream_type , typename seq_type , typename id_type , typename qual_type >
void	write_sequence_record (stream_type &stream, sequence_file_output_options const &options, seq_type &&sequence, id_type &&id, qual_type &&qualities)
	Write the given fields to the specified stream. More...

Detailed Description

The SAM format (tag).

Introduction

SAM is often used for storing alignments of several read sequences against one or more reference sequences. See the article on wikipedia for an introduction of the format or look into the official SAM format specifications. SeqAn implements version 1.6 of the SAM specification.

Take a look at our tutorial Alignment Input and Output in SeqAn for a walk through of how to read alignment files.

fields_specialisation

The SAM format provides the following fields: seqan3::field::alignment, seqan3::field::seq, seqan3::field::qual, seqan3::field::id, seqan3::field::ref_seq, seqan3::field::ref_id seqan3::field::ref_ossfet, seqan3::field::offset, seqan3::field::flag, seqan3::field::mapq and seqan3::field::mate. In addition there is the seqan3::field::header_ptr, which is usually only used internally to provide the range-based functionality of the file.

None of the fields are required when writing but will be defaulted to '0' for numeric fields and '*' for other fields.

SAM format columns -> fields

Since many users will be accustomed to the columns of the SAM format, here is a mapping of the common SAM format columns to the SeqAn record fields:

#	SAM Column ID	FIELD name
1	QNAME	seqan3::field::id
2	FLAG	seqan3::field::flag
3	RNAME	seqan3::field::ref_id
4	POS	seqan3::field::ref_offset
5	MAPQ	seqan3::field::mapq
6	CIGAR	implicilty stored in seqan3::field::alignment
7	RNEXT	seqan3::field::mate (tuple pos 0)
8	PNEXT	seqan3::field::mate (tuple pos 1)
9	TLEN	seqan3::field::mate (tuple pos 2)
10	SEQ	seqan3::field::seq
11	QUAL	seqan3::field::qual

The (read sequence/query) OFFSET will be required to store the soft clipping information at the read start (end clipping will be automatically deduced by how much the read sequence length + offset is larger than the alignment length).

Note: SeqAn currently does not support hard clipping. When reading SAM, hard-clipping is discarded; but the resulting alignment/sequence combination is still valid.

Format Check

The format checks are implemented according to the official SAM format specifications in order to ensure correct SAM file output.

If a non-recoverable format violation is encountered on reading, or you specify invalid values/combinations when writing, seqan3::format_error is thrown.

Header implementation

The SAM header (if present) is read/written once in the beginning before the first record is read/written.

Constructor & Destructor Documentation

◆ ~format_sam()

seqan3::format_sam::~format_sam ( )

default

Defaulted.

Member Function Documentation

◆ read_alignment_record()

template<typename stream_type , typename seq_legal_alph_type , typename ref_seqs_type , typename ref_ids_type , typename seq_type , typename id_type , typename offset_type , typename ref_seq_type , typename ref_id_type , typename ref_offset_type , typename align_type , typename cigar_type , typename flag_type , typename mapq_type , typename qual_type , typename mate_type , typename tag_dict_type , typename e_value_type , typename bit_score_type >

void seqan3::format_sam::read_alignment_record	(	stream_type &	stream,
		alignment_file_input_options< seq_legal_alph_type > const &	options,
		ref_seqs_type &	ref_seqs,
		alignment_file_header< ref_ids_type > &	header,
		seq_type &	seq,
		qual_type &	qual,
		id_type &	id,
		offset_type &	offset,
		ref_seq_type &	ref_seq,
		ref_id_type &	ref_id,
		ref_offset_type &	ref_offset,
		align_type &	align,
		cigar_type &	cigar_vector,
		flag_type &	flag,
		mapq_type &	mapq,
		mate_type &	mate,
		tag_dict_type &	tag_dict,
		e_value_type &	e_value,
		bit_score_type &	bit_score
	)

inlineprotected

Read from the specified stream and back-insert into the given field buffers.

Template Parameters

stream_type	The input stream type; Must be derived from std::ostream.
ref_seqs_type	e.g. std::deque<ref_sequence_type> or decltype(std::ignore).
seq_type	Type of the seqan3::field::seq input (see seqan3::alignment_file_input_traits).
qual_type	Type of the seqan3::field::qual input (see seqan3::alignment_file_input_traits).
id_type	Type of the seqan3::field::id input (see seqan3::alignment_file_input_traits).
offset_type	Type of the seqan3::field::offset input (see seqan3::alignment_file_input_traits).
ref_seq_type	Type of the seqan3::field::ref_seq input (see seqan3::alignment_file_input_traits).
ref_id_type	Type of the seqan3::field::ref_id input (see seqan3::alignment_file_input_traits).
ref_offset_type	Type of the seqan3::field::ref_offset input (see seqan3::alignment_file_input_traits).
align_type	Type of the seqan3::field::alignment input (see seqan3::alignment_file_input_traits).
cigar_type	Type of the seqan3::field::cigar input (a std::vector<cigar> or std::ignore).
flag_type	Type of the seqan3::field::flag input (see seqan3::alignment_file_input_traits).
mapq_type	Type of the seqan3::field::mapq input (see seqan3::alignment_file_input_traits).
mate_type	std::tuple<ref_id_type, ref_offset_type, int32_t> or decltype(std::ignore).
tag_dict_type	seqan3::sam_tag_dictionary or decltype(std::ignore).
e_value_type	Type of the seqan3::field::evalue input (see seqan3::alignment_file_input_traits).
bit_score_type	Type of the seqan3::field::bit_score input (see seqan3::alignment_file_input_traits).

Parameters

[in,out]	stream	The input stream to read from.
[in]	options	File specific options passed to the format.
[out]	ref_seqs	The reference sequences to the corresponding alignments.
[out]	header	A pointer to the seqan3::alignment_file_header object.
[out]	seq	The buffer for seqan3::field::seq input.
[out]	qual	The buffer for seqan3::field::qual input.
[out]	id	The buffer for seqan3::field::id input.
[out]	offset	The buffer for seqan3::field::offset input.
[out]	ref_seq	The buffer for seqan3::field::ref_seq input.
[out]	ref_id	The buffer for seqan3::field::ref_id input.
[out]	ref_offset	The buffer for seqan3::field::ref_offset input.
[out]	align	The buffer for seqan3::field::alignment input.
[out]	cigar_vector	The buffer for seqan3::field::cigar input.
[out]	flag	The buffer for seqan3::field::flag input.
[out]	mapq	The buffer for seqan3::field::mapq input.
[out]	mate	The buffer for seqan3::field::mate input.
[out]	tag_dict	The buffer for seqan3::field::tags input.
[out]	e_value	The buffer for seqan3::field::evalue input.
[out]	bit_score	The buffer for seqan3::field::bit_score input.

Additional requirements

The function must also accept std::ignore as parameter for any of the fields, except stream, options and header. [This is enforced by the concept checker!]
In this case the data read for that field shall be discarded by the format.

◆ read_sequence_record()

template<typename stream_type , typename seq_legal_alph_type , bool seq_qual_combined, typename seq_type , typename id_type , typename qual_type >

void seqan3::format_sam::read_sequence_record	(	stream_type &	stream,
		sequence_file_input_options< seq_legal_alph_type, seq_qual_combined > const &	options,
		seq_type &	sequence,
		id_type &	id,
		qual_type &	qualities
	)

inlineprotected

Read from the specified stream and back-insert into the given field buffers.

Template Parameters

stream_type	Input stream, must satisfy seqan3::input_stream_over with `char`.
seq_type	Type of the seqan3::field::seq input; must satisfy std::ranges::output_range over a seqan3::alphabet.
id_type	Type of the seqan3::field::id input; must satisfy std::ranges::output_range over a seqan3::alphabet.
qual_type	Type of the seqan3::field::qual input; must satisfy std::ranges::output_range over a seqan3::writable_quality_alphabet.

Parameters

[in,out]	stream	The input stream to read from.
[in]	options	File specific options passed to the format.
[out]	sequence	The buffer for seqan3::field::seq input, i.e. the "sequence".
[out]	id	The buffer for seqan3::field::id input, e.g. the header line in FastA.
[out]	qualities	The buffer for seqan3::field::qual input.

Additional requirements

The function must also accept std::ignore as parameter for any of the fields. [This is enforced by the concept checker!]
In this case the data read for that field shall be discarded by the format.
Instead of passing the fields seqan3::field::seq and seqan3::field::qual, you may also pass seqan3::field::seq_qual to both parameters. If you do, the std::ranges::range_value_t of the argument must be a specialisation of seqan3::qualified and the second template parameter to seqan3::sequence_file_input_options must be set to true.

◆ write_alignment_record()

template<typename stream_type , typename header_type , typename seq_type , typename id_type , typename ref_seq_type , typename ref_id_type , typename align_type , typename qual_type , typename mate_type , typename tag_dict_type , typename e_value_type , typename bit_score_type >

void seqan3::format_sam::write_alignment_record	(	stream_type &	stream,
		alignment_file_output_options const &	options,
		header_type &&	header,
		seq_type &&	seq,
		qual_type &&	qual,
		id_type &&	id,
		int32_t const	offset,
		ref_seq_type &&	ref_seq,
		ref_id_type &&	ref_id,
		std::optional< int32_t >	ref_offset,
		align_type &&	align,
		std::vector< cigar > const &	cigar_vector,
		sam_flag const	flag,
		uint8_t const	mapq,
		mate_type &&	mate,
		tag_dict_type &&	tag_dict,
		e_value_type &&	e_value,
		bit_score_type &&	bit_score
	)

inlineprotected

Write the given fields to the specified stream.

Template Parameters

stream_type	Output stream, must model seqan3::output_stream_over with `char`.
seq_type	Type of the seqan3
id_type	Type of the seqan3
offset_type	Type of the seqan3
ref_seq_type	Type of the seqan3
ref_id_type	Type of the seqan3
ref_offset_type	Type of the seqan3
align_type	Type of the seqan3
flag_type	Type of the seqan3
mapq_type	Type of the seqan3
qual_type	Type of the seqan3
mate_type	Type of the seqan3
tag_dict_type	Type of the seqan3
e_value_type	Type of the seqan3
bit_score_type	Type of the seqan3

Parameters

[in,out]	stream	The output stream to write into.
[in]	options	File specific options passed to the format.
[in]	header	A pointer to the header object of the file.
[in]	seq	The data for seqan3::field::seq, i.e. the query sequence.
[in]	qual	The data for seqan3::field::qual, e.g. the query quality sequence.
[in]	id	The data for seqan3::field::id, e.g. the read id.
[in]	offset	The data for seqan3::field::offset, i.e. the start position of the alignment in `seq`.
[in]	ref_seq	The data for seqan3::field::ref_offset, i.e. the reference sequence.
[in]	ref_id	The data for seqan3::field::ref_id, e.g. the reference id..
[in]	ref_offset	The data for seqan3::field::ref_offset, i.e. the start position of the alignment in `ref_seq`.
[in]	align	The data for seqan3::field::align, e.g. the alignment between query and ref.
[in]	cigar_vector	The data for seqan3::field::cigar, e.g. representing the alignment between query and ref.
[in]	flag	The data for seqan3::field::flag, e.g. the SAM mapping flag value.
[in]	mapq	The data for seqan3::field::mapq, e.g. the mapping quality value.
[in]	mate	The data for seqan3::field::mate, e.g. the mate information of paired reads.
[in]	tag_dict	The data for seqan3::field::tags, e.g. the optional SAM field tag dictionary.
[in]	e_value	The data for seqan3::field::e_value, e.g. the e-value of the alignment (BLAST).
[in]	bit_score	The data for seqan3::field::, e.g. the bit score of the alignment (BLAST).

◆ write_sequence_record()

template<typename stream_type , typename seq_type , typename id_type , typename qual_type >

void seqan3::format_sam::write_sequence_record	(	stream_type &	stream,
		sequence_file_output_options const &	options,
		seq_type &&	sequence,
		id_type &&	id,
		qual_type &&	qualities
	)

inlineprotected

Write the given fields to the specified stream.

Template Parameters

stream_type	Output stream, must satisfy seqan3::output_stream_over with `char`.
seq_type	Type of the seqan3::field::seq output; must satisfy std::ranges::output_range over a seqan3::alphabet.
id_type	Type of the seqan3::field::id output; must satisfy std::ranges::output_range over a seqan3::alphabet.
qual_type	Type of the seqan3::field::qual output; must satisfy std::ranges::output_range over a seqan3::quality_alphabet.

Parameters

[in,out]	stream	The output stream to write into.
[in]	options	File specific options passed to the format.
[in]	sequence	The data for seqan3::field::seq, i.e. the "sequence".
[in]	id	The data for seqan3::field::id, e.g. the header line in FastA.
[in]	qualities	The data for seqan3::field::qual.

Additional requirements

The format must also accept std::ignore as parameter for any of the fields, however it shall throw an exception if one of the fields required for writing the format is marked as such. [this shall be checked inside the function]
The format does not handle seqan3::field::seq_qual, instead seqan3::sequence_file_output splits it into two views and passes it to the format as if they were separate.

Member Data Documentation

◆ file_extensions

std::vector<std::string> seqan3::format_sam::file_extensions

inlinestatic

Initial value:

{
        { "sam" },
    }

The valid file extensions for this format; note that you can modify this value.

The documentation for this class was generated from the following file:

seqan3/io/alignment_file/format_sam.hpp

Public Member Functions

Static Public Attributes

Protected Member Functions

Detailed Description

Introduction

fields_specialisation

SAM format columns -> fields

Format Check

Header implementation

Constructor & Destructor Documentation

◆ ~format_sam()

Member Function Documentation

◆ read_alignment_record()

Additional requirements

◆ read_sequence_record()

Additional requirements

◆ write_alignment_record()

◆ write_sequence_record()

Additional requirements

Member Data Documentation

◆ file_extensions