fn() countKmersCounts kmers in a sequence. Optionally, a background model is returned.
Counts kmers in a sequence. Optionally, a background model is returned.
Defined in | <seqan/alignment_free.h> |
---|---|
Signature |
void countKmers(kmerCounts, sequence, k);
void countKmers(kmerCounts, bgFrequencies, sequence, k);
void countKmers(kmerCounts, bgModel, sequence, k);
|
Parameters
kmerCounts
|
String of unsigned with kmer counts for every k-mer. |
---|---|
bgFrequencies
|
String of background frequencies (double) representing the model. |
bgModel
|
MarkovModel to use. |
sequence
|
String (sequence) where k-mers are counted. |
k
|
k-mer length (unsigned). |
Detailed Description
k-mers overlapping masked (aka 'N') letters are not counted in case of Dna5Strings. A Bernoulli or Markov Model can be choosen as a background model.
Examples
Calculate the alignment free sequence similarity o two masked DNA sequences.
using namespace seqan;
// Masked sequence, we do not want to count words overlapping 'N'
Dna5String sequenceDna5 =
"TAGGTTTTCCGAAAAGGTAGCAACTTTACGTGATCAAACCTCTGACGGGGTTTTCCCCGTCGAAATTGGGTG"
"TTTCTTGTCTTGTTCTCACTTGGGGCATCTCCGTCAAGCCAAGAAAGTGCTCCCTGGATTCTGTTGCTAACG"
"AGTCTCCTCTGCATTCCTGCTTGACTGATTGGGCGGACGGGGTGTCCACCTGACGCTGAGTATCGCCGTCAC"
"GGTGCCACATGTCTTATCTATTCAGGGATCAGAATTTATTCAGGAAATCAGGAGATGCTACACTTGGGTTAT"
"CGAAGCTCCTTCCAAGGCGTAGCAAGGGCGACTGAGCGCGTAAGCTCTAGATCTCCTCGTGTTGCAACTACA"
"CGCGCGGGTCACTCGAAACACATAGTATGAACTTAACGACTGCTCGTACTGAACAATGCTGAGGCAGAAGAT"
"CGCAGACCAGGCATCCCACTGCTTGAAAAAACTATNNNNCTACCCGCCTTTTTATTATCTCATCAGATCAAG";
String<unsigned> kmerCounts;
unsigned k = 2; // Count all 2-mers
countKmers(kmerCounts, sequenceDna5, k);
for(unsigned i = 0; i<16; ++i) // Print the 2-mer counts
std::cout<<kmerCounts[i]<<"\n"; // 34 times AA; 30 times AC; 28 times AG; ...
String<double> nucleotideFrequencies; // Defines a Bernoulli model for DNA sequences.
// Count all 2-mers and save the nucleotide frequencies
countKmers(kmerCounts, nucleotideFrequencies, sequenceDna5, k);
for(unsigned i = 0; i<4; ++i) // Print the nucleotide frequencies
std::cout << nucleotideFrequencies[i] << "\n";
// => p(A) = 0.238; p(C) = 0.254; p(G) = 0.238; p(T) = 0.27;
MarkovModel<Dna, double> backgroundModel(1); // Markov model of order 1
// Count all 2-mers and return a Markov model
countKmers(kmerCounts, backgroundModel, sequenceDna5, k);
std::cout<<backgroundModel.transition; // Print the transition matrix of the markov model
Data Races
Thread safety unknown!