Class ClusterReduction
Specialization for ReducedAminoAcid

Defined in	seqan/reduced_aminoacid.h
Signature	`template <unsigned char n, unsigned char m = 24, typename TMatrix = Blosum62> struct ClusterReduction;`

Template Parameters

`n`	the size of the reduced alphabet (between 2 and m-1)
`m`	size to truncate alphabet to, before clustering (one of 20, 22, 24; default 24)
`TMatrix`	Matrix used for clustering (default Blosum62, none other supported right now)

Detailed Description

WhenToUse

Use m = 24 when you expect 'X' and '*' in the dataset you reduce from. This is especially the case on translated genomic reads.

If you have validated protein sequences, you can use can use m = 20 or m = 22, which will not include special characters (see AminoAcid for details).

Background

The method employed for reducing the alphabet is similar to Murphy et al, 2000, http://www.ncbi.nlm.nih.gov/pubmed/10775656

Correlation coefficients for the Blosum62 scores of all pairs of amino acids in the alphabet were computed and clustered with WPGMA (using UPGMA as second criterium when WPGMA yields the same distance between two clusters).

The exact clustering for m = 24.

Class ClusterReductionSpecialization for ReducedAminoAcid

Template Parameters

Detailed Description

WhenToUse

Background

Class ClusterReduction
Specialization for ReducedAminoAcid