SeqAn3  3.0.2
The Modern C++ library for sequence analysis.
Alignment File

Provides files and formats for handling alignment data. More...

+ Collaboration diagram for Alignment File:

Classes

class  seqan3::alignment_file_header< ref_ids_type >
 Stores the header information of alignment files. More...
 
class  seqan3::alignment_file_input< traits_type_, selected_field_ids_, valid_formats_ >
 A class for reading alignment files, e.g. SAM, BAM, BLAST ... More...
 
struct  seqan3::alignment_file_input_default_traits< ref_sequences_t, ref_ids_t >
 The default traits for seqan3::alignment_file_input. More...
 
interface  alignment_file_input_format
 The generic concept for alignment file input formats. More...
 
struct  seqan3::alignment_file_input_options< sequence_legal_alphabet >
 The options type defines various option members that influence the behaviour of all or some formats. More...
 
interface  alignment_file_input_traits
 The requirements a traits_type for seqan3::alignment_file_input must meet. More...
 
class  seqan3::alignment_file_output< selected_field_ids_, valid_formats_, ref_ids_type >
 A class for writing alignment files, e.g. SAM, BAL, BLAST, ... More...
 
interface  alignment_file_output_format
 The generic concept for alignment file out formats. More...
 
struct  seqan3::alignment_file_output_options
 The options type defines various option members that influence the behavior of all or some formats. More...
 
class  seqan3::format_bam
 The BAM format. More...
 
class  seqan3::format_sam
 The SAM format (tag). More...
 
struct  std::tuple_element< elem_no, seqan3::alignment_file_input< traits_type, selected_field_ids, valid_formats > >
 Obtains the type of the specified element. More...
 
struct  std::tuple_size< seqan3::alignment_file_input< traits_type, selected_field_ids, valid_formats > >
 Provides access to the number of elements in a tuple as a compile-time constant expression. More...
 

Enumerations

enum  seqan3::sam_flag : uint16_t {
  seqan3::sam_flag::none = 0, seqan3::sam_flag::paired = 0x1, seqan3::sam_flag::proper_pair = 0x2, seqan3::sam_flag::unmapped = 0x4,
  seqan3::sam_flag::mate_unmapped = 0x8, seqan3::sam_flag::on_reverse_strand = 0x10, seqan3::sam_flag::mate_on_reverse_strand = 0x20, seqan3::sam_flag::first_in_pair = 0x40,
  seqan3::sam_flag::second_in_pair = 0x80, seqan3::sam_flag::secondary_alignment = 0x100, seqan3::sam_flag::failed_filter = 0x200, seqan3::sam_flag::duplicate = 0x400,
  seqan3::sam_flag::supplementary_alignment = 0x800
}
 An enum flag that describes the properties of an aligned read (given as a SAM record). More...
 

Detailed Description

Provides files and formats for handling alignment data.

Introduction

Alignment files are primarily used to store pairwise alignments of two biological sequences and often come with many additional information. Well-known formats include the SAM/BAM format used to store read mapping data or the BLAST format that stores the results of a query search against a data base.

Note
For a step-by-step guide take a look at our tutorial: Alignment Input and Output in SeqAn.

The Alignment file abstraction supports reading 15 different fields:

  1. seqan3::field::seq
  2. seqan3::field::id
  3. seqan3::field::offset
  4. seqan3::field::ref_seq
  5. seqan3::field::ref_id
  6. seqan3::field::ref_offset
  7. seqan3::field::alignment
  8. seqan3::field::cigar
  9. seqan3::field::mapq
  10. seqan3::field::qual
  11. seqan3::field::flag
  12. seqan3::field::mate
  13. seqan3::field::tags
  14. seqan3::field::evalue
  15. seqan3::field::bit_score

There exists one more field for alignment files, the seqan3::field::header_ptr, but this field is mostly used internally. Please see the seqan3::alignment_file_output::header member function for details on how to access the seqan3::alignment_file_header of the file.)

All of these fields are retrieved by default (and in that order). Note that some of the fields are specific to the SAM format (e.g. seqan3::field::flag) while others are specific to BLAST format (e.g. seqan3::field::bit_score). Please see the corresponding formats for more details.

Enumeration Type Documentation

◆ sam_flag

enum seqan3::sam_flag : uint16_t
strong

An enum flag that describes the properties of an aligned read (given as a SAM record).

The SAM flag are bitwise flags, which means that each value corresponds to a specific bit that is set and that they can be combined and tested using binary operations. See this tutorial for an introduction on bitwise operations on enum flags.

Example:

#include <iostream>
#include <sstream>
auto sam_file_raw = R"(@HD VN:1.6 SO:coordinate GO:none
@SQ SN:ref LN:45
r001 99 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG !!!!!!!!!!!!!!!!!
r003 0 ref 29 30 5S6M * 0 0 GCCTAAGCTAA !!!!!!!!!!! SA:Z:ref,29,-,6H5M,17,0;
r003 4 * 29 17 * * 0 0 TAGGC @@@@@ SA:Z:ref,9,+,5S6M,30,1;
r001 147 ref 237 30 9M = 7 -39 CAGCGGCAT !!!!!!!!! NM:i:1
)";
int main()
{
for (auto & rec : fin)
{
// Check if a certain flag value (bit) is set:
if (static_cast<bool>(seqan3::get<seqan3::field::flag>(rec) & seqan3::sam_flag::unmapped))
std::cout << "Read " << seqan3::get<seqan3::field::id>(rec) << " is unmapped\n";
if (seqan3::get<seqan3::field::qual>(rec)[0] < seqan3::assign_char_to('@', seqan3::phred42{})) // low quality
{
// Set a flag value (bit):
seqan3::get<seqan3::field::flag>(rec) &= seqan3::sam_flag::failed_filter;
// Note that this does not affect other flag values (bits),
// e.g. `seqan3::get<seqan3::field::flag>(rec) & seqan3::sam_flag::unmapped` may still be true
}
// Unset a flag value (bit):
seqan3::get<seqan3::field::flag>(rec) &= ~seqan3::sam_flag::duplicate; // not marked as a duplicate anymore
}
}

Adapted from the SAM specifications are the following additional information to some flag values:

  • For each read/contig in a SAM file, it is required that one and only one line associated with the read has neither the seqan3::sam_flag::secondary_alignment nor the seqan3::sam_flag::supplementary_alignment flag value set (satisfies FLAG & 0x900 == 0). This line is called the primary alignment of the read.
  • seqan3::sam_flag::secondary_alignment (bit 0x100) marks the alignment not to be used in certain analyses when the tools in use are aware of this bit. It is typically used to flag alternative mappings when multiple mappings are presented in a SAM.
  • seqan3::sam_flag::supplementary_alignment (bit 0x800) indicates that the corresponding alignment line is part of a chimeric alignment. If the SAM/BAM file corresponds to long reads (nanopore/pacbio) this happens when reads are split before being aligned and the best matching part is marked as primary, while all other aligned parts are marked supplementary.
  • seqan3::sam_flag::unmapped (bit 0x4) is the only reliable place to tell whether the read is unmapped. If seqan3::sam_flag::unmapped is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, and seqan3::sam_flag::proper_pair, seqan3::sam_flag::secondary_alignment, and seqan3::sam_flag::supplementary_alignment (bits 0x2, 0x100, and 0x800).
  • seqan3::sam_flag::on_reverse_strand (bit 0x10) indicates whether the read sequence has been reverse complemented and the quality string is reversed. When bit seqan3::sam_flag::unmapped (0x4) is unset, this corresponds to the strand to which the segment has been mapped: seqan3::sam_flag::on_reverse_strand (bit 0x10) unset indicates the forward strand, while set indicates the reverse strand. When seqan3::sam_flag::unmapped (0x4) is set, this indicates whether the unmapped read is stored in its original orientation as it came off the sequencing machine.
  • seqan3::sam_flag::first_in_pair and seqan3::sam_flag::second_in_pair (bits 0x40 and 0x80) reflect the read ordering within each template inherent in the sequencing technology used. If seqan3::sam_flag::first_in_pair and seqan3::sam_flag::second_in_pair (0x40 and 0x80) are both set, the read is part of a linear template, but it is neither the first nor the last read. If both are unset, the index of the read in the template is unknown. This may happen for a non-linear template or when this information is lost during data processing.
  • If seqan3::sam_flag::paired (bit 0x1) is unset, no assumptions can be made about seqan3::sam_flag::proper_pair, seqan3::sam_flag::mate_unmapped, seqan3::sam_flag::mate_on_reverse_strand, seqan3::sam_flag::first_in_pair and seqan3::sam_flag::second_in_pair (bits 0x2, 0x8, 0x20, 0x40 and 0x80).
See also
https://broadinstitute.github.io/picard/explain-flags.html
Enumerator
none 

None of the flags below are set.

paired 

The aligned read is paired (paired-end sequencing).

proper_pair 

The two aligned reads in a pair have a proper distance between each other.

unmapped 

The read is not mapped to a reference (unaligned).

mate_unmapped 

The mate of this read is not mapped to a reference (unaligned).

on_reverse_strand 

The read sequence has been reverse complemented before being mapped (aligned).

mate_on_reverse_strand 

The mate sequence has been reverse complemented before being mapped (aligned).

first_in_pair 

Indicates the ordering (see details in the seqan3::sam_flag description).

second_in_pair 

Indicates the ordering (see details in the seqan3::sam_flag description).

secondary_alignment 

This read alignment is an alternative (possibly suboptimal) to the primary.

failed_filter 

The read alignment failed a filter, e.g. quality controls.

duplicate 

The read is marked as a PCR duplicate or optical duplicate.

supplementary_alignment 

This sequence is part of a split alignment and is not the primary alignment.

sstream
std::istringstream
iostream
std::cout
seqan3::alignment_file_input
A class for reading alignment files, e.g. SAM, BAM, BLAST ...
Definition: input.hpp:353
seqan3
The main SeqAn3 namespace.
Definition: aligned_sequence_concept.hpp:29
seqan3::phred42
Quality type for traditional Sanger and modern Illumina Phred scores (typical range).
Definition: phred42.hpp:44
seqan3::format_sam
The SAM format (tag).
Definition: format_sam.hpp:126
seqan3::assign_char_to
constexpr auto assign_char_to
Assign a character to an alphabet object.
Definition: concept.hpp:417
all.hpp
Meta-include for the alignment IO submodule.