fn() countKmersCounts kmers in a sequence. Optionally, a background model is returned.
Counts kmers in a sequence. Optionally, a background model is returned.
Defined in | <seqan/alignment_free.h> |
---|---|
Signature |
void countKmers(kmerCounts, sequence, k);
void countKmers(kmerCounts, bgFrequencies, sequence, k);
void countKmers(kmerCounts, bgModel, sequence, k);
|
Parameters
kmerCounts
|
String of unsigned with kmer counts for every k-mer. |
---|---|
bgFrequencies
|
String of background frequencies (double) representing the model. |
bgModel
|
MarkovModel to use. |
sequence
|
String (sequence) where k-mers are counted. |
k
|
k-mer length (unsigned). |
Detailed Description
k-mers overlapping masked (aka 'N') letters are not counted in case of Dna5Strings. A Bernoulli or Markov Model can be choosen as a background model.
Examples
Calculate the alignment free sequence similarity o two masked DNA sequences.
using namespace seqan; // Masked sequence, we do not want to count words overlapping 'N' Dna5String sequenceDna5 = "TAGGTTTTCCGAAAAGGTAGCAACTTTACGTGATCAAACCTCTGACGGGGTTTTCCCCGTCGAAATTGGGTG" "TTTCTTGTCTTGTTCTCACTTGGGGCATCTCCGTCAAGCCAAGAAAGTGCTCCCTGGATTCTGTTGCTAACG" "AGTCTCCTCTGCATTCCTGCTTGACTGATTGGGCGGACGGGGTGTCCACCTGACGCTGAGTATCGCCGTCAC" "GGTGCCACATGTCTTATCTATTCAGGGATCAGAATTTATTCAGGAAATCAGGAGATGCTACACTTGGGTTAT" "CGAAGCTCCTTCCAAGGCGTAGCAAGGGCGACTGAGCGCGTAAGCTCTAGATCTCCTCGTGTTGCAACTACA" "CGCGCGGGTCACTCGAAACACATAGTATGAACTTAACGACTGCTCGTACTGAACAATGCTGAGGCAGAAGAT" "CGCAGACCAGGCATCCCACTGCTTGAAAAAACTATNNNNCTACCCGCCTTTTTATTATCTCATCAGATCAAG"; String<unsigned> kmerCounts; unsigned k = 2; // Count all 2-mers countKmers(kmerCounts, sequenceDna5, k); for(unsigned i = 0; i<16; ++i) // Print the 2-mer counts std::cout<<kmerCounts[i]<<"\n"; // 34 times AA; 30 times AC; 28 times AG; ... String<double> nucleotideFrequencies; // Defines a Bernoulli model for DNA sequences. // Count all 2-mers and save the nucleotide frequencies countKmers(kmerCounts, nucleotideFrequencies, sequenceDna5, k); for(unsigned i = 0; i<4; ++i) // Print the nucleotide frequencies std::cout << nucleotideFrequencies[i] << "\n"; // => p(A) = 0.238; p(C) = 0.254; p(G) = 0.238; p(T) = 0.27; MarkovModel<Dna, double> backgroundModel(1); // Markov model of order 1 // Count all 2-mers and return a Markov model countKmers(kmerCounts, backgroundModel, sequenceDna5, k); std::cout<<backgroundModel.transition; // Print the transition matrix of the markov model