Class MarkovModel
Gives a suitable representation of a Marcov Chain.

Defined in <seqan/statistics.h>
Signature template <typename TAlphabet[, typename TFloat[, typename TSpec]]> class MarkovModel;

Template Parameters

TAlphabet The type of the underlying alphabet.
TFloat The type for storing counts, default is double.
TSpec Tag for specialization.

Member Function Overview

Member Variable Overview

Detailed Description


Build a MarkovModel from Background

#include <iostream>
#include <fstream>

#include <seqan/index.h>
#include <seqan/statistics.h>
#include <seqan/seq_io.h>

using namespace seqan;

int main()
    // Build path to background FASTA file.
    CharString bgPath = SEQAN_PATH_TO_ROOT();
    append(bgPath, "/demos/statistics/background.fa");

    // Read the background from a file into X.
    StringSet<DnaString> X;
    SeqFileIn seqFile;
    if (!open(seqFile, toCString(bgPath)))
        std::cerr << "ERROR: Could not open " << bgPath << "\n";
        return 1;
    StringSet<CharString> ids;  // will be ignored
    readRecords(ids, X, seqFile);

    // Create MarkovModel of order 3 from the background.
    MarkovModel<Dna> mm(3);
    buildMarkovModel(mm, X);

    // Build set of words that we want to compute the zscore of.
    StringSet<DnaString> W;
    appendValue(W, "CCCAAAGC");
    appendValue(W, "CCCAAAGTAAATT");

    // Compute and print zscore.
    std::cout << "zscore: " << zscore(W, X, mm, AhoCorasick()) << "\n";

// //TODO his path has to be set explicitely when calling the demo
//  FILE *fd = fopen("projects/library/demos/zscore_human_mm.3","r");
//  read(fd, mm);
//  fclose(fd);

    //std::cout << zscore(W, X, mm, WuManber()) << std::endl;

    return 0;

The following example shows how to build a MarkovModel over a Dna alphabet from a set of background sequence. After build the model, we compute the zscore.

zscore: 11.8323

Load a MarkovModel from File

We can also load the MarkovModel from a file (previously saved using write). Since we do not have the background word set here but only the model, we compute the variance of a word using the function calculateVariance from the alignment_free module.

#include <iostream>
#include <fstream>

#include <seqan/index.h>
#include <seqan/alignment_free.h>
#include <seqan/statistics.h>
#include <seqan/seq_io.h>

using namespace seqan;

int main()
    // Build path to serialized MarkovModel.
    CharString mmPath = SEQAN_PATH_TO_ROOT();
    append(mmPath, "/demos/statistics/zscore_example_mm.3");

    // Open the file.
    FILE * mmFile = fopen(toCString(mmPath), "rb");
    if (!mmFile)
        std::cerr << "ERROR: Could not open " << mmPath << "\n";
        return 1;

    // Create MarkovModel of order 3 and load it from the file.
    MarkovModel<Dna> mm(3);
    read(mmFile, mm);
    fclose(mmFile);  // close file again

    // Build set of words that we want to compute the zscore of.
    DnaString word = "CCCAAAGC";

    // Compute variance.
    double variance = 0;
    int n = 10000;  // assumed text length
    calculateVariance(variance, word, mm, n);
    std::cout << "variance: " << variance << "\n";

    return 0;
variance: 0.267919

Member Functions Detail

void MarkovModel::build(stringSet);

Compute the transition matrix from a training set.


stringSet The StringSet to build the model for.

The character statitionary distribution and the auxiliary information that give raise to an instance of a Markov Model are also computed.

TFloat MarkovModel::emittedProbability(s); TFloat MarkovModel::emittedProbability(ss);

Computes the probability that a string or a set of strings is emitted by the MarkovModel.


s The String to compute the emission probability for.
ss The StringSet to compute the emission probability for.


TFloat The emission probability, TFloat is the TFloat from the MarkovModel.




order The order of the model (unsigned).

void MarkovModel::read(file);

Load an instance of MarkovModel from a file.


file The file to read the model from (type FILE *).

void MarkovModel(transition[, stationaryDistribution]);

Set transition matrix.


transition The transition matrix.
stationaryDistribution The vector of character distributions.

Given e transition matrix, sets it as transition matrix of the MarkovModel and computes (if it is not available) the vector of character distributions and the auxiliary information.

void MarkovModel::write(file);

Stores an instance of a markovModel in a file.


file The file to write the model to (type FILE *).

Member Variables Detail

unsigned MarkovModel::order

The order of the MarkovModel.

TVector MarkovModel::stationaryDistribution

The vector of characgter distribution (String of TFloat).

TMatrix MarkovModel::transition

The transition matirx.