Function
calculateVariance
Calculates the variance for the number of word occurrences of a word in a sequence of length n given a background model.
Include Headers
seqan/alignment_free.h
Parameters
Variance of the number of occurrences of the word in a sequence of length n given the model Types: double | |
Usually a DNA sequence Types: String | |
String of background frequencies representing the model Types: double | |
Markov model Types: MarkovModel | |
Length of the sequence where the occurrences of word are counted Types: integer |
Remarks
Calculates the variance for the number of word occurrences of a word in a sequence of length n given a background model (Markov model or Bernoulli model).
The formula is obtained from Robin, S., Rodolphe, F., and Schbath, S. (2005). DNA, Words and Models. Cambridge University Press.
See Jonathan Goeke et al (to appear) for details on the implementation.
Return Values
TValue variance; Variance of the number of occurrences of the word in a sequence of length n given the model
Examples
Calculate the variance for the number of occurrences of CAAGTC in a sequence of length 10000bp with p(A)=p(T)=0.3 and p(C)=p(G)=0.2.
double var = 0.0;
int n = 10000;
DnaString word = "CAAGTC";
String<double> model;
resize(model, 4);
model[0] = 0.3; // p(A)
model[1] = 0.2; // p(C)
model[2] = 0.2; // p(G)
model[3] = 0.3; // p(T)
calculateVariance(var, word, model, n); // var = 2.16
Estimate a Markov model on a set of sequences and calculate the variance for the number of occurrences of the word CAAGTC in a sequence of length 10000bp.
double var = 0.0;
int n = 10000;
DnaString word = "CAAGTC";
StringSet<DnaString> sequences;
appendValue(sequences, "CAGAAAAAAACACTGATTAACAGGAATAAGCAGTTTACTTATTTTGGGCCTGGGACCCGTGTCTCTAATTTAATTAGGTGATCCCTGCGAAGTTTCTCCA");
MarkovModel<Dna, double> model(0); // Bernoulli model
model.build(sequences);
calculateVariance(var, word, model, n); // var = 2.16
MarkovModel<Dna, double> model1(1); // First order Markov model
model1.build(sequences);
calculateVariance(var, word, model1, n); // var = 1.69716
See Also
SeqAn - Sequence Analysis Library - www.seqan.de