SeqAn3 3.3.0-rc.1
The Modern C++ library for sequence analysis.
Quality

Provides the various quality score types. More...

+ Collaboration diagram for Quality:

Classes

class  seqan3::phred42
 Quality type for traditional Sanger and modern Illumina Phred scores.. More...
 
class  seqan3::phred63
 Quality type for traditional Sanger and modern Illumina Phred scores.. More...
 
class  seqan3::phred68solexa
 Quality type for Solexa and deprecated Illumina formats.. More...
 
class  seqan3::phred94
 Quality type for PacBio Phred scores of HiFi reads.. More...
 
class  seqan3::phred_base< derived_type, size >
 A CRTP-base that refines seqan3::alphabet_base and is used by the quality alphabets. More...
 
class  seqan3::qualified< sequence_alphabet_t, quality_alphabet_t >
 Joins an arbitrary alphabet with a quality alphabet. More...
 
interface  quality_alphabet
 A concept that indicates whether an alphabet represents quality scores. More...
 
interface  writable_quality_alphabet
 A concept that indicates whether a writable alphabet represents quality scores. More...
 

Typedefs

template<typename alphabet_type >
using seqan3::alphabet_phred_t = decltype(seqan3::to_phred(std::declval< alphabet_type >()))
 The phred_type of the alphabet; defined as the return type of seqan3::to_phred. More...
 
using seqan3::dna15q = qualified< dna15, phred42 >
 An alphabet that stores a seqan3::dna15 letter and an seqan3::qualified letter at each position. More...
 
using seqan3::dna4q = qualified< dna4, phred42 >
 An alphabet that stores a seqan3::dna4 letter and an seqan3::phred42 letter at each position. More...
 
using seqan3::dna5q = qualified< dna5, phred42 >
 An alphabet that stores a seqan3::dna5 letter and an seqan3::phred42 letter at each position. More...
 
using seqan3::rna15q = qualified< rna15, phred42 >
 An alphabet that stores a seqan3::rna15 letter and an seqan3::qualified letter at each position. More...
 
using seqan3::rna4q = qualified< rna4, phred42 >
 An alphabet that stores a seqan3::rna4 letter and an seqan3::phred42 letter at each position. More...
 
using seqan3::rna5q = qualified< rna5, phred42 >
 An alphabet that stores a seqan3::rna5 letter and an seqan3::phred42 letter at each position. More...
 

Function objects (Quality)

constexpr auto seqan3::to_phred = detail::adl_only::to_phred_cpo{}
 The public getter function for the Phred representation of a quality score. More...
 
constexpr auto seqan3::assign_phred_to = detail::adl_only::assign_phred_to_cpo{}
 Assign a Phred score to a quality alphabet object. More...
 

Detailed Description

Provides the various quality score types.

See also
Alphabet

Introduction

Quality score sequences are usually output together with the DNA (or RNA) sequence by sequencing machines like the Illumina Genome Analyzer. The quality score of a nucleotide is also known as Phred score and is an integer score being inversely proportional to the propability $p$ that a base call is incorrect. Which roughly means that the higher a Phred score is, the higher is the probabality that the corresponding nucleotide is correct for that position. There exists two common variants of its computation:

Thus, despite implicit conversion between different quality types is supported, for very low quality levels the scores vary significantly and need to be corrected by an offset before being compared. For easy handling of the Phred score in file formats and console output, it is mapped to a single ASCII character. The sequencing / analyser machine, e.g. HiSeq, PacBio, will dictate which Phred format is used. Output files storing DNA sequences and their quality scores are usually stored in the FASTQ format indicated by the file extensions fastq or fq. This sub-module provides multiple quality alphabets that can be used in combination with regular containers and ranges.

Encoding Schemes

Standard Use Case Format Encoding Alphabet Type Phred Score Range Rank Range ASCII Range
Sanger, Illumina Sanger, Illumina 1.8+ Phred+33 seqan3::phred42 [0 .. 41] [0 .. 41] [33 .. 74]
['!' .. 'J']
Sanger, Illumina Sanger, Illumina 1.8+ Phred+33 seqan3::phred63 [0 .. 62] [0 .. 62] [33 .. 95]
['!' .. '_']
PacBio Sanger, Illumina 1.8+ Phred+33 seqan3::phred94 [0 .. 93] [0 .. 93] [33 .. 126]
['!' .. '~']
Solexa Solexa, Illumina [1.0; 1.8[ Phred+64 seqan3::phred68solexa [-5 .. 62] [0 .. 67] [59 .. 126]
[';' .. '~']

The most distributed format is the Sanger or Illumina 1.8+ format. Despite typical Phred scores for Illumina machines range from 0 to 41, it is possible that processed reads reach higher scores. If you do not intend handling Phred scores larger than 41, we recommend using seqan3::phred42 due to its more space-efficient implementation (see below). If you want to store PacBio HiFi reads, we recommend to use seqan3::phred94, as these use the full range of the Phred quality scores. For other formats, like Solexa and Illumina 1.0 to 1.7, the type seqan3::phred68solexa is provided. To also cover the Solexa format, the Phred score is stored as a signed integer starting at -5.

The following figure gives a graphical explanation of the different Alphabet Types:

SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS....................................................
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM...............................
PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP
..........................OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
|.........................|..............|....................|..............................|
33........................59............73....................95............................126

0_______________________________________40.....................................................
0_______________________________________40____________________62...............................
0_______________________________________40____________________62_____________________________93
.........................-5____0________9____________________________________________________62

S - Sanger, Illumina 1.8+ - phred42
M - Sanger, Illumina 1.8+ - phred63
P - Sanger, Illumina 1.8+ - phred94 (PacBio)
O - Solexa - phred68solexa

Graphic was inspired by https://en.wikipedia.org/wiki/FASTQ_format#Encoding (last access 28.01.2021).

Quality values are usually paired together with nucleotides. Therefore, it stands to reason to combine both alphabets into a new data structure. In SeqAn, this can be done with seqan3::qualified. It represents the cross product between a nucleotide and a quality alphabet and is the go-to choice when compression is of interest.

The following combinations still fit into a single byte:

Using seqan3::qualified can half the storage usage compared to storing qualities and nucleotides separately. Note that any combination of seqan3::phred94 with another alphabet will cause seqan3::qualified to use at least 2 bytes. While we used DNA alphabets in this example, the same properties hold true for RNA alphabets.

Concept

The quality submodule defines the seqan3::writable_quality_alphabet which encompasses all the alphabets, defined in the submodule, and refines the seqan3::writable_alphabet by providing Phred score assignment and conversion operations. Additionally, this submodule defines the seqan3::quality_alphabet, which only requires readablity and not assignability.

Assignment and Conversion

Quality alphabets can be converted to their char and rank representation via seqan3::to_char and seqan3::to_rank respectively (like all other alphabets). Additionally they can be converted to their Phred representation via seqan3::to_phred.

Likewise, assignment happens via seqan3::assign_char_to, seqan3::assign_rank_to and seqan3::assign_phred_to. Phred values outside the representable range, but inside the legal range, are converted to the closest Phred score, e.g. assigning 60 to a seqan3::phred42 will result in a Phred score of 41. Assigning Phred values outside the legal range results in undefined behaviour.

All quality alphabets are explicitly convertible to each other via their Phred representation. Values not present in one alphabet are mapped to the closest value in the target alphabet (e.g. a seqan3::phred63 letter with value 60 will convert to a seqan3::phred42 letter of score 41, this also applies to seqan3::phred94).

Typedef Documentation

◆ alphabet_phred_t

template<typename alphabet_type >
using seqan3::alphabet_phred_t = typedef decltype(seqan3::to_phred(std::declval<alphabet_type>()))

The phred_type of the alphabet; defined as the return type of seqan3::to_phred.

This entity is stable. Since version 3.1.

◆ dna15q

An alphabet that stores a seqan3::dna15 letter and an seqan3::qualified letter at each position.

This entity is stable. Since version 3.1.

◆ dna4q

using seqan3::dna4q = typedef qualified<dna4, phred42>

An alphabet that stores a seqan3::dna4 letter and an seqan3::phred42 letter at each position.

This entity is stable. Since version 3.1.

◆ dna5q

using seqan3::dna5q = typedef qualified<dna5, phred42>

An alphabet that stores a seqan3::dna5 letter and an seqan3::phred42 letter at each position.

This entity is stable. Since version 3.1.

◆ rna15q

An alphabet that stores a seqan3::rna15 letter and an seqan3::qualified letter at each position.

This entity is stable. Since version 3.1.

◆ rna4q

using seqan3::rna4q = typedef qualified<rna4, phred42>

An alphabet that stores a seqan3::rna4 letter and an seqan3::phred42 letter at each position.

This entity is stable. Since version 3.1.

◆ rna5q

using seqan3::rna5q = typedef qualified<rna5, phred42>

An alphabet that stores a seqan3::rna5 letter and an seqan3::phred42 letter at each position.

This entity is stable. Since version 3.1.

Variable Documentation

◆ assign_phred_to

constexpr auto seqan3::assign_phred_to = detail::adl_only::assign_phred_to_cpo{}
inlineconstexpr

Assign a Phred score to a quality alphabet object.

Template Parameters
your_typeThe type of the target object. Must model the seqan3::quality_alphabet.
Parameters
chrThe Phred score being assigned; must be of the seqan3::alphabet_phred_t of the target object.
Returns
Reference to alph if alph was given as lvalue, otherwise a copy.

This is a function object. Invoke it with the parameter(s) specified above.

It acts as a wrapper and looks for three possible implementations (in this order):

  1. A static member function assign_phred_to(phred_type const chr, your_type & a) of the class seqan3::custom::alphabet<your_type>.
  2. A free function assign_phred_to(phred_type const chr, your_type & a) in the namespace of your type (or as friend).
  3. A member function called assign_phred(phred_type const chr) (not assign_phred_to).

Functions are only considered for one of the above cases if they are marked noexcept (constexpr is not required, but recommended) and if the returned type is your_type &.

Every writable quality alphabet type must provide one of the above. Note that temporaries of your_type are handled by this function object and do not require an additional overload.

Customisation point

This is a customisation point (see Customisation). To specify the behaviour for your own alphabet type, simply provide one of the three functions specified above.

This entity is experimental and subject to change in the future. Implementation 2 (free function) is not stable.

This entity is stable. Since version 3.1. The name seqan3::assign_phred_to, Implementation 1, and Implementation 3 are stable and will not change.

◆ to_phred

constexpr auto seqan3::to_phred = detail::adl_only::to_phred_cpo{}
inlineconstexpr

The public getter function for the Phred representation of a quality score.

Template Parameters
your_typeThe type of alphabet. Must model the seqan3::quality_alphabet.
Parameters
chrThe quality value to convert into the Phred score.
Returns
the Phred representation of a quality score.

This is a function object. Invoke it with the parameter(s) specified above.

It acts as a wrapper and looks for three possible implementations (in this order):

  1. A static member function to_phred(your_type const a) of the class seqan3::custom::alphabet<your_type>.
  2. A free function to_phred(your_type const a) in the namespace of your type (or as friend).
  3. A member function called to_phred().

Functions are only considered for one of the above cases if they are marked noexcept (constexpr is not required, but recommended) and if the returned type is convertible to size_t.

Every quality alphabet type must provide one of the above.

Customisation point

This is a customisation point (see Customisation). To specify the behaviour for your own alphabet type, simply provide one of the three functions specified above.

This entity is experimental and subject to change in the future. Implementation 2 (free function) is not stable.

This entity is stable. Since version 3.1. The name seqan3::to_phred, Implementation 1, and Implementation 3 are stable and will not change.