fn() countKmers
Counts kmers in a sequence. Optionally, a background model is returned.

Defined in <seqan/alignment_free.h>
Signature void countKmers(kmerCounts, sequence, k); void countKmers(kmerCounts, bgFrequencies, sequence, k); void countKmers(kmerCounts, bgModel, sequence, k);

Parameters

kmerCounts String of unsigned with kmer counts for every k-mer.
bgFrequencies String of background frequencies (double) representing the model.
bgModel MarkovModel to use.
sequence String (sequence) where k-mers are counted.
k k-mer length (unsigned).

Detailed Description

k-mers overlapping masked (aka 'N') letters are not counted in case of Dna5Strings. A Bernoulli or Markov Model can be choosen as a background model.

Examples

Calculate the alignment free sequence similarity o two masked DNA sequences.

using namespace seqan;
// Masked sequence, we do not want to count words overlapping 'N'
Dna5String sequenceDna5 =
    "TAGGTTTTCCGAAAAGGTAGCAACTTTACGTGATCAAACCTCTGACGGGGTTTTCCCCGTCGAAATTGGGTG"
    "TTTCTTGTCTTGTTCTCACTTGGGGCATCTCCGTCAAGCCAAGAAAGTGCTCCCTGGATTCTGTTGCTAACG"
    "AGTCTCCTCTGCATTCCTGCTTGACTGATTGGGCGGACGGGGTGTCCACCTGACGCTGAGTATCGCCGTCAC"
    "GGTGCCACATGTCTTATCTATTCAGGGATCAGAATTTATTCAGGAAATCAGGAGATGCTACACTTGGGTTAT"
    "CGAAGCTCCTTCCAAGGCGTAGCAAGGGCGACTGAGCGCGTAAGCTCTAGATCTCCTCGTGTTGCAACTACA"
    "CGCGCGGGTCACTCGAAACACATAGTATGAACTTAACGACTGCTCGTACTGAACAATGCTGAGGCAGAAGAT"
    "CGCAGACCAGGCATCCCACTGCTTGAAAAAACTATNNNNCTACCCGCCTTTTTATTATCTCATCAGATCAAG";

String<unsigned> kmerCounts;
unsigned k = 2;  // Count all 2-mers
countKmers(kmerCounts, sequenceDna5, k);

for(unsigned i = 0; i<16; ++i)       // Print the 2-mer counts
    std::cout<<kmerCounts[i]<<"\n";  // 34 times AA; 30 times AC; 28 times AG; ...


String<double> nucleotideFrequencies;  // Defines a Bernoulli model for DNA sequences.
// Count all 2-mers and save the nucleotide frequencies
countKmers(kmerCounts, nucleotideFrequencies, sequenceDna5, k);

for(unsigned i = 0; i<4; ++i)          // Print the nucleotide frequencies
    std::cout << nucleotideFrequencies[i] << "\n";
// => p(A) = 0.238; p(C) = 0.254; p(G) = 0.238; p(T) = 0.27;

MarkovModel<Dna, double>  backgroundModel(1);  // Markov model of order 1
// Count all 2-mers and return a Markov model
countKmers(kmerCounts, backgroundModel, sequenceDna5, k);
std::cout<<backgroundModel.transition;  // Print the transition matrix of the markov model

Data Races

Thread safety unknown!

See Also