fn() calculateVarianceCalculates the variance for the number of word occurrences of a word in a sequence of length n given a
background model.
Calculates the variance for the number of word occurrences of a word in a sequence of length n given a
background model.
Defined in | <seqan/alignment_free.h> |
---|---|
Signature |
void calculateVariance(variance, word, bgFrequencies, n);
void calculateVariance(variance, word, bgModel, n);
|
Parameters
variance
|
Variance of the number of occurrences of the word in a sequence of length n given the model; double. |
---|---|
word
|
String, usually of Dna to compute variance for. |
bgFrequencies
|
String of bg frequencies representing the model. |
bgModel
|
MarkovModel to use. |
n
|
Length of the sequence where the occurrences of word are counted, int. |
Detailed Description
Calculates the variance for the number of word occurrences of a word in a sequence of length n given a background model (Markov model or Bernoulli model). The formula is obtained from (Robin et al., 2005).
References
Robin, S., Rodolphe, F., and Schbath, S. (2005). DNA, Words and Models. Cambridge University Press. See Jonathan Goeke et al (to appear) for details on the implementation.
Examples
Calculate the variance for the number of occurrences of CAAGTC in a sequence of length 10000bp with p(A) = p(T) = 0.3 and p(C) = p(G) = 0.2.
using namespace seqan; double var = 0.0; int n = 10000; DnaString word = "CAAGTC"; String<double> model; resize(model, 4); model[0] = 0.3; // p(A) model[1] = 0.2; // p(C) model[2] = 0.2; // p(G) model[3] = 0.3; // p(T) calculateVariance(var, word, model, n); // var = 2.16
Estimate a Markov model on a set of sequences and calculate the variance for the number of occurrences of the word CAAGTC in a sequence of length 10000bp.
using namespace seqan; double var = 0.0; int n = 10000; DnaString word = "CAAGTC"; StringSet<DnaString> sequences; appendValue(sequences, "CAGAAAAAAACACTGATTAACAGGAATAAGCAGTTTACTTATTTTGGGCCTGGGACCCGTGTCTCTAATTTAATTAGGTGATCCCTGCGAAGTTTCTCCA"); MarkovModel<Dna, double> model(0); // Bernoulli model model.build(sequences); calculateVariance(var, word, model, n); // var = 2.16 MarkovModel<Dna, double> model1(1); // First order Markov model model1.build(sequences); calculateVariance(var, word, model1, n); // var = 1.69716
Data Races
Thread safety unknown!