Function
countKmers
Counts kmers in a sequence. Optionally, a background model is returned.
Include Headers
seqan/alignment_free.h
Parameters
String<unsigned> with kmer counts for every k-mer Types: String | |
String of background frequencies representing the model Types: double | |
Markov model Types: MarkovModel | |
String (sequence) where k-mers are counted Types: String |
Remarks
k-mers overlapping masked letters are not counted in case of Dna5Strings. A Bernoulli or Markov Model can be choosen as a background model.
Return Values
String<unsigned> with kmer counts for every k-mer
Examples
Calculate the alignment free sequence similarity o two masked DNA sequences.
// Masked sequence, we do not want to count words overlapping 'N'
Dna5String sequenceDna5 =
"TAGGTTTTCCGAAAAGGTAGCAACTTTACGTGATCAAACCTCTGACGGGGTTTTCCCCGTCGAAATTGGGTG"
"TTTCTTGTCTTGTTCTCACTTGGGGCATCTCCGTCAAGCCAAGAAAGTGCTCCCTGGATTCTGTTGCTAACG"
"AGTCTCCTCTGCATTCCTGCTTGACTGATTGGGCGGACGGGGTGTCCACCTGACGCTGAGTATCGCCGTCAC"
"GGTGCCACATGTCTTATCTATTCAGGGATCAGAATTTATTCAGGAAATCAGGAGATGCTACACTTGGGTTAT"
"CGAAGCTCCTTCCAAGGCGTAGCAAGGGCGACTGAGCGCGTAAGCTCTAGATCTCCTCGTGTTGCAACTACA"
"CGCGCGGGTCACTCGAAACACATAGTATGAACTTAACGACTGCTCGTACTGAACAATGCTGAGGCAGAAGAT"
"CGCAGACCAGGCATCCCACTGCTTGAAAAAACTATNNNNCTACCCGCCTTTTTATTATCTCATCAGATCAAG";
String<unsigned> kmerCounts;
unsigned k = 2; // Count all 2-mers
countKmers(kmerCounts, sequenceDna5, k);
for(unsigned i = 0; i<16; ++i) // Print the 2-mer counts
std::cout<<kmerCounts[i]<<"\n"; // 34 times AA; 30 times AC; 28 times AG; ...
String<double> nucleotideFrequencies; // Defines a Bernoulli model for DNA sequences.
// Count all 2-mers and save the nucleotide frequencies
countKmers(kmerCounts, nucleotideFrequencies, sequenceDna5, k);
for(unsigned i = 0; i<4; ++i) // Print the nucleotide frequencies
std::cout << nucleotideFrequencies[i] << "\n";
// => p(A) = 0.238; p(C) = 0.254; p(G) = 0.238; p(T) = 0.27;
MarkovModel<Dna, double> backgroundModel(1); // Markov model of order 1
// Count all 2-mers and return a Markov model
countKmers(kmerCounts, backgroundModel, sequenceDna5, k);
std::cout<<backgroundModel.transition; // Print the transition matrix of the markov model
See Also
SeqAn - Sequence Analysis Library - www.seqan.de