Class ClusterReduction
Specialization for ReducedAminoAcid

Defined in seqan/reduced_aminoacid.h
Signature template <unsigned char n, unsigned char m = 24, typename TMatrix = Blosum62> struct ClusterReduction;

Template Parameters

n the size of the reduced alphabet (between 2 and m-1)
m size to truncate alphabet to, before clustering (one of 20, 22, 24; default 24)
TMatrix Matrix used for clustering (default Blosum62, none other supported right now)

Detailed Description

WhenToUse

Use m = 24 when you expect 'X' and '*' in the dataset you reduce from. This is especially the case on translated genomic reads.

If you have validated protein sequences, you can use can use m = 20 or m = 22, which will not include special characters (see AminoAcid for details).

Background

The method employed for reducing the alphabet is similar to Murphy et al, 2000, http://www.ncbi.nlm.nih.gov/pubmed/10775656

Correlation coefficients for the Blosum62 scores of all pairs of amino acids in the alphabet were computed and clustered with WPGMA (using UPGMA as second criterium when WPGMA yields the same distance between two clusters).

The exact clustering for m = 24.