Function
calculateCovariance
Calculates the covariance for the number of word occurrences for two words in a sequence of length n, given a background model.
calculateCovariance(covariance, word1, word2, backgroundFrequencies, n)
calculateCovariance(covariance, word1, word2, bgModel, n)
Include Headers
seqan/alignment_free.h
Parameters
covariance
Variance of the number of occurrences of the word in a sequence of length n given the model
Types: double
word1
Usually a DNA sequence
Types: String
word2
Usually a DNA sequence
Types: String
backgroundFrequencies
String of background frequencies representing the model
Types: double
bgModel
Markov model
n
Length of the sequence where the occurrences of word are counted
Types: integer
Remarks
Calculates the covariance for the number of word occurrences for two words in a sequence of length n given a background model (Markov model or Bernoulli model). The covariance is influenced by the property of words to overlap, for example, the words ATAT and TATA have a high covariance since they are likely to overlap. The formula is based on Robin, S., Rodolphe, F., and Schbath, S. (2005). DNA, Words and Models. Cambridge University Press. See Jonathan Goeke et al (to appear) for details on the implementation.
Return Values
TValue covariance; Covariance of the number of occurrences of the word in a sequence of length n given the model
Examples
Calculate the covariance for the number of occurrences of ATATAT and TATATA in a sequence of length 10000bp with p(A)=p(T)=0.3 and p(C)=p(G)=0.2.
using namespace seqan;
double covar = 0.0;
int n = 10000;
DnaString word1 = "ATATAT";
DnaString word2 = "TATATA";
String<double> model;
resize(model, 4);
model[0] = 0.3;  // p(A)
model[1] = 0.2;  // p(C)
model[2] = 0.2;  // p(G)
model[3] = 0.3;  // p(T)
calculateCovariance(covar, word1, word2, model, n);  // covar = 4.74
Estimate a Markov model on a set of sequences and calculate the covariance for the number of occurrences of ATATAT and TATATA in a sequence of length 10000bp.
using namespace seqan;
double covar = 0.0;
int n = 10000;
DnaString word1 = "ATATAT";
DnaString word2 = "TATATA";
StringSet<DnaString> sequences;
appendValue(sequences, "CAGCACTGATTAACAGGAATAAGCAGTTTACTTCTGTCAGAATATTGGGCATATATA"
                       "CTGGGACCCGTGTAATACTCTAATTTAATTAGGTGATCCCTGCGAAGTCTCCA");
MarkovModel<Dna, double> modelMM0(0);  // Bernoulli model
modelMM0.build(sequences);
calculateCovariance(covar, word1, word2, modelMM0, n);  // covar = 4.74
MarkovModel<Dna, double> modelMM1(1);  // First order Markov model
modelMM1.build(sequences);
calculateCovariance(covar, word1, word2, modelMM1, n);  // covar = 13.1541
SeqAn - Sequence Analysis Library - www.seqan.de
 

Page built @2013/07/11 09:12:17