SeqAn3  3.0.0
The Modern C++ library for sequence analysis.
seqan3::format_sam Struct Reference

The SAM format (tag). More...

#include <seqan3/io/alignment_file/format_sam.hpp>

+ Inheritance diagram for seqan3::format_sam:

Static Public Attributes

static std::vector< std::stringfile_extensions
 The valid file extensions for this format; note that you can modify this value. More...
 
static constexpr char format_version [4] = "1.6"
 The format version string.
 

Detailed Description

The SAM format (tag).

Introduction

SAM is often used for storing alignments of several read sequences against one or more reference sequences. See the article on wikipedia for an introduction of the format or look into the official SAM format specifications. SeqAn implements version 1.6 of the SAM specification.

Take a look at our tutorial Alignment Input and Output in SeqAn3 for a walk through of how to read alignment files.

Fields

The SAM format provides the following fields: seqan3::field::ALIGNMENT, seqan3::field::SEQ, seqan3::field::QUAL, seqan3::field::ID, seqan3::field::REF_SEQ, seqan3::field::REF_ID seqan3::field::REF_OSSFET, seqan3::field::OFFSET, seqan3::field::FLAG, seqan3::field::MAPQ and seqan3::field::MATE. In addition there is the seqan3::field::HEADER_PTR, which is usually only used internally to provide the range-based functionality of the file.

None of the fields are required when writing but will be defaulted to '0' for numeric fields and '*' for other fields.

SAM format columns -> fields

Since many users will be accustomed to the columns of the SAM format, here is a mapping of the common SAM format columns to the SeqAn3 record fields:

# SAM Column ID FIELD name
1 QNAME seqan3::field::ID
2 FLAG seqan3::field::FLAG
3 RNAME seqan3::field::REF_ID
4 POS seqan3::field::REF_OFFSET
5 MAPQ seqan3::field::MAPQ
6 CIGAR implicilty stored in seqan3::field::ALIGNMENT
7 RNEXT seqan3::field::MATE (tuple pos 0)
8 PNEXT seqan3::field::MATE (tuple pos 1)
9 TLEN seqan3::field::MATE (tuple pos 2)
10 SEQ seqan3::field::SEQ
11 QUAL seqan3::field::QUAL

The (read sequence/query) OFFSET will be required to store the soft clipping information at the read start (end clipping will be automatically deduced by how much the read sequence length + offset is larger than the alignment length).

Note: SeqAn currently does not support hard clipping. When reading SAM, hard-clipping is discarded; but the resulting alignment/sequence combination is still valid.

Format Check

The format checks are implemented according to the official SAM format specifications in order to ensure correct SAM file output.

If a non-recoverable format violation is encountered on reading, or you specify invalid values/combinations when writing, seqan3::format_error is thrown.

Header implementation

The SAM header (if present) is read/written once in the beginning before the first record is read/written.

Member Data Documentation

◆ file_extensions

std::vector<std::string> seqan3::format_sam::file_extensions
inlinestatic
Initial value:
{
{ "sam" },
}

The valid file extensions for this format; note that you can modify this value.


The documentation for this struct was generated from the following file: