Function
countKmers
Counts kmers in a sequence. Optionally, a background model is returned.
countKmers(kmerCounts, sequence, k)
countKmers(kmerCounts, backgroundFrequencies, sequence, k)
countKmers(kmerCounts, bgModel, sequence, k)
Include Headers
seqan/alignment_free.h
Parameters
kmerCounts
String<unsigned> with kmer counts for every k-mer
Types: String
backgroundFrequencies
String of background frequencies representing the model
Types: double
bgModel
Markov model
sequence
String (sequence) where k-mers are counted
Types: String
Remarks
k-mers overlapping masked letters are not counted in case of Dna5Strings. A Bernoulli or Markov Model can be choosen as a background model.
Return Values
String<unsigned> with kmer counts for every k-mer
Examples
Calculate the alignment free sequence similarity o two masked DNA sequences.
using namespace seqan;
// Masked sequence, we do not want to count words overlapping 'N'
Dna5String sequenceDna5 =
    "TAGGTTTTCCGAAAAGGTAGCAACTTTACGTGATCAAACCTCTGACGGGGTTTTCCCCGTCGAAATTGGGTG"
    "TTTCTTGTCTTGTTCTCACTTGGGGCATCTCCGTCAAGCCAAGAAAGTGCTCCCTGGATTCTGTTGCTAACG"
    "AGTCTCCTCTGCATTCCTGCTTGACTGATTGGGCGGACGGGGTGTCCACCTGACGCTGAGTATCGCCGTCAC"
    "GGTGCCACATGTCTTATCTATTCAGGGATCAGAATTTATTCAGGAAATCAGGAGATGCTACACTTGGGTTAT"
    "CGAAGCTCCTTCCAAGGCGTAGCAAGGGCGACTGAGCGCGTAAGCTCTAGATCTCCTCGTGTTGCAACTACA"
    "CGCGCGGGTCACTCGAAACACATAGTATGAACTTAACGACTGCTCGTACTGAACAATGCTGAGGCAGAAGAT"
    "CGCAGACCAGGCATCCCACTGCTTGAAAAAACTATNNNNCTACCCGCCTTTTTATTATCTCATCAGATCAAG";
 
String<unsigned> kmerCounts;
unsigned k = 2;  // Count all 2-mers
countKmers(kmerCounts, sequenceDna5, k);
 
for(unsigned i = 0; i<16; ++i)       // Print the 2-mer counts
    std::cout<<kmerCounts[i]<<"\n";  // 34 times AA; 30 times AC; 28 times AG; ...
 
 
String<double> nucleotideFrequencies;  // Defines a Bernoulli model for DNA sequences.
// Count all 2-mers and save the nucleotide frequencies
countKmers(kmerCounts, nucleotideFrequencies, sequenceDna5, k);
 
for(unsigned i = 0; i<4; ++i)          // Print the nucleotide frequencies
    std::cout << nucleotideFrequencies[i] << "\n";
// => p(A) = 0.238; p(C) = 0.254; p(G) = 0.238; p(T) = 0.27;
 
MarkovModel<Dna, double>  backgroundModel(1);  // Markov model of order 1
// Count all 2-mers and return a Markov model
countKmers(kmerCounts, backgroundModel, sequenceDna5, k);
std::cout<<backgroundModel.transition;  // Print the transition matrix of the markov model
SeqAn - Sequence Analysis Library - www.seqan.de
 

Page built @2013/07/11 09:12:17