SeqAn3 3.4.0-rc.1
The Modern C++ library for sequence analysis.
|
Provides the amino acid alphabets and functionality for translation from nucleotide. More...
Classes | |
class | seqan3::aa10li |
The reduced Li amino acid alphabet. More... | |
class | seqan3::aa10murphy |
The reduced Murphy amino acid alphabet. More... | |
class | seqan3::aa20 |
The canonical amino acid alphabet. More... | |
class | seqan3::aa27 |
The twenty-seven letter amino acid alphabet. More... | |
interface | aminoacid_alphabet |
A concept that indicates whether an alphabet represents amino acids. More... | |
class | seqan3::aminoacid_base< derived_type, size > |
A CRTP-base that refines seqan3::alphabet_base and is used by the amino acids. More... | |
struct | seqan3::aminoacid_empty_base |
This is an empty base class that can be inherited by types that shall model seqan3::aminoacid_alphabet. More... | |
Functions | |
template<genetic_code gc = genetic_code::canonical, nucleotide_alphabet nucl_type> | |
constexpr aa27 | seqan3::translate_triplet (nucl_type const &n1, nucl_type const &n2, nucl_type const &n3) noexcept |
Translate one nucleotide triplet into single amino acid (single nucleotide interface). | |
Variables | |
template<typename t > | |
constexpr bool | seqan3::enable_aminoacid |
A trait that indicates whether a type shall model seqan3::aminoacid_alphabet. | |
Provides the amino acid alphabets and functionality for translation from nucleotide.
Amino acid sequences are an important part of bioinformatic data processing and used by many applications and while it is possible to represent them in a regular std::string, it makes sense to have specialised data structures in most cases. This sub-module offers the 27 letter aminoacid alphabet as well as three reduced versions that can be used with regular container and ranges. The 27 letter amino acid alphabet contains the 20 canonical amino acids, 2 additional proteinogenic amino acids (Pyrrolysine and Selenocysteine) and a termination letter (*). Additionally 4 wildcard letters are offered which allow a more generic usage for example in case of ambiguous amino acids (e.g. J which means either Isoleucine or Leucine). See also https://en.wikipedia.org/wiki/Amino_acid for more information about the amino acid alphabet.
Amino acid name | Three letter code | One letter code | Remapped in seqan3::aa20 | Remapped in seqan3::aa10murphy | Remapped in seqan3::aa10li |
---|---|---|---|---|---|
Alanine | Ala | A | A | A | A |
Arginine | Arg | R | R | K | K |
Asparagine | Asn | N | N | B | H |
Aspartic acid | Asp | D | D | B | B |
Cysteine | Cys | C | C | C | C |
Tyrosine | Tyr | Y | Y | F | F |
Glutamic acid | Glu | E | E | B | B |
Glutamine | Gln | Q | Q | B | B |
Glycine | Gly | G | G | G | G |
Histidine | His | H | H | H | H |
Isoleucine | Ile | I | I | I | I |
Leucine | leu | L | L | I | J |
Lysine | Lys | K | K | K | K |
Methionine | Met | M | M | I | J |
Phenylalanine | Phe | F | F | F | F |
Proline | Pro | P | P | P | P |
Serine | Ser | S | S | S | A |
Threonine | Thr | T | T | S | A |
Tryptophan | Trp | W | W | F | F |
Valine | Val | V | V | I | I |
Selenocysteine | Sec | U | C | C | C |
Pyrrolysine | Pyl | O | K | K | K |
Asparagine or aspartic acid | Asx | B | D | B | B |
Glutamine or glutamic acid | Glx | Z | E | B | B |
Leucine or Isoleucine | Xle | J | L | I | J |
Unknown | Xaa | X | S | S | A |
Stop Codon | N/A | * | W | F | F |
As shown above, alphabets smaller than 27 internally represent multiple amino acids as one.
For most cases it is highly recommended to use seqan3::aa27 as seqan3::aa20 provides no benefits in regard to space consumption (both need 5bits). Use it only when you know you need to interface with other software of formats that only support the canonical set.
|
constexprnoexcept |
Translate one nucleotide triplet into single amino acid (single nucleotide interface).
nucl_type | The type of input nucleotides; must model seqan3::nucleotide_alphabet. |
[in] | n1 | First nucleotide in triplet. |
[in] | n2 | Second nucleotide in triplet. |
[in] | n3 | Third nucleotide in triplet. |
Translates single nucleotides into amino acid according to given genetic code.
Constant.
No-throw guarantee.
|
inlineconstexpr |
A trait that indicates whether a type shall model seqan3::aminoacid_alphabet.
t | Type of the argument. |
This is an auxiliary trait that is checked by seqan3::aminoacid_alphabet to verify that a type is an amino acid. This trait should never be read from, instead use seqan3::aminoacid_alphabet. However, user-defined alphabets that want to model seqan3::aminoacid_alphabet need to make sure that it evaluates to true
for their type.
Do not specialise this trait directly. It acts as a wrapper and looks for three possible implementations (in this order):
static
member variable enable_aminoacid
of the class seqan3::custom::alphabet<t>
.constexpr bool enable_aminoacid(t) noexcept
in the namespace of your type (or as friend
).true
if the type inherits from seqan3::aminoacid_empty_base (or seqan3::aminoacid_base),false
otherwise.Implementations of 1. and 2. are required to be marked constexpr
and the value / return value must be convertible to bool
. Implementations of 2. are required to be marked noexcept
. The value passed to functions implementing 2. shall be ignored, it is only used for finding the function via argument-dependent lookup. In case that your type is not seqan3::is_constexpr_default_constructible_v and you wish to provide an implementation for 2., instead overload for std::type_identity<t>
.
To make a type model seqan3::aminoacid_alphabet, it is recommended that you derive from seqan3::aminoacid_base. If that is not possible, choose option 2., and only implement option 1. as a last resort.
This is a customisation point (see Customisation). To change the default behaviour for your own alphabet, follow the above instructions.