Graphs

Graphs in SeqAn.

Graphs can be used to model a great variety of problems and hence,
they are in widespread use as a problem-solving tool. Directed and undirected graphs, trees, automatons, and
alignment graphs are different types of graphs with distinct features. In SeqAn, all graph types
are specializations of Graph. Graphs can be traversed using iterators. Besides the fundamental
graph data structures, SeqAn provides the most well-known graph algorithms, e.g. Dijkstra's shortest
path algorithm. The fundamental graph algorithms are supplemented with some specialized alignment algorithms.
All graphs can be exported in dot format for visualization.

A directed graph. |

A Directed graph has an optional cargo template argument.
A cargo is some kind of object attached to the edges of the graph.
A typical example are distances in flight network graphs.
Cargos and edges are handled as one unit: If you add an edge you have to provide a cargo,
if you remove an edge you also remove the cargo. If you do not want to use cargos, you
can leave out the TCargo parameter or you can use void , which is the default.
All edges are directed that is, they have a distinct source
and target vertex.

An undirected graph with multiple components. |

An Undirected graph has, of course, undirected edges. In all other respects,
it can be used in the same way as a directed graph.

In this example, the edges have no cargo.

Alignment graph, tree, and trie. |

SeqAn is a library for sequence analysis and hence, it makes use of graphs as helper classes
for string matching, alignment algorithms or phylogenetic trees. Because of that, SeqAn provides
,besides directed and undirected graphs, specialized graphs like a Tree, an Automaton,
a Trie or an Alignment Graph.

All graphs support a set of common functions, including numEdges, numVertices,
addVertex, removeVertex, addEdge etc. The functions are listed and explained in the Graph
section. Besides this core set of functions, some graphs
have specialized methods, e.g., trees support childVertex, addChild and the like. See the documentation
of these graphs for details. The following code snippet illustrates the use of some of these functions to create a directed
graph.

typedef VertexDescriptor<TGraph>::Type TVertexDescriptor;

typedef EdgeDescriptor<TGraph>::Type TEdgeDescriptor;

TGraph g;

TVertexDescriptor v0 = addVertex(g);

TVertexDescriptor v1 = addVertex(g);

TEdgeDescriptor e1 =addEdge(g,v0,v1);

TVertexDescriptor v2 = addVertex(g);

TVertexDescriptor v3 = addVertex(g);

TVertexDescriptor v4 = addVertex(g);

addEdge(g,v3,v4);

addEdge(g,v3,v2);

addEdge(g,v2,v2);

addEdge(g,v3,v0);

addEdge(g,v3,v1);

addEdge(g,v4,v3);

::std::cout << g << ::std::endl;

removeVertex(g, v2);

::std::cout << g << ::std::endl;

As you can see in this example, all vertices are represented by vertex descriptors and all edges are represented
by edge descriptors. Both types can be accessed by the corresponding metafunction, namely VertexDescriptor
and EdgeDescriptor. Vertex and edge descriptors are the value types of all iterators working
on graphs.

Iterators are used to traverse a graph. The value type of vertex and edge iterators
are always vertex or edge descriptors, respectively. These descriptors are a kind of handle to deal with vertices and edges,
e.g. an edge descriptor can be used to access the cargo of an edge.
Vertex Iterator, Adjacency Iterator,
Bfs Iterator and Dfs Preorder Iterator are the basic vertex iterators. The following
code snippet shows how to use the adjacency iterator on vertex 3 to traverse all
adjacent neighbors of that vertex.

TAdjacencyIterator it(g,3);

for(;!atEnd(it);goNext(it)) {

std::cout << *it << std::endl;

}

Notice how the metafunction Iterator is used to get the correct iterator
type for TGraph . We simply output all adjacent vertices. Besides vertex iterators,
SeqAn provides a simple Edge Iterator
and an Out-Edge Iterator.

TOutEdgeIterator it(g, v0);

for(;!atEnd(it);goNext(it)) {

std::cout << cargo(*it) << std::endl;

}

In this example, we traverse all out-edges of vertex 0 and print their cargos.

Property maps are a fundamental abstraction mechanism to attach auxiliary information to the vertices and
edges of a graph. A typical example are the city names and flight distances of a flight network graph.
In most scenarios, users should use an external property map to attach this information
to the graph but other mechanisms do exist (e.g., the possibility to use a Cargo).
Be aware that the word external is a hint that the information is stored independent of the graph
and functions like removeVertex do not affect the property map. Property maps are indexed
via the already well-known vertex and edge descriptors. This is quite handy since we can use, for instance, a
vertex iterator to traverse the graph and access the properties on the fly. Most graph algorithms
make heavily use of this concept and therefore it is illustrated below.
First, let us create a simple graph.

typedef VertexDescriptor<TGraph>::Type TVertexDescriptor;

typedef EdgeDescriptor<TGraph>::Type TEdgeDescriptor;

TGraph g;

TVertexDescriptor v0 = addVertex(g);

TVertexDescriptor v1 = addVertex(g);

TEdgeDescriptor e1 =addEdge(g,v0,v1);

Second, we have to create and resize an edge property map so that it can hold information on all edges.

resizeEdgeMap(g , distanceMap);

As you can see, property maps are simply strings of some type. In this case, we use unsigned integer to
store the distances for our graph g . For each edge we can store a distance now.

Note how the edge descriptor is used to access the property map. An algorithm can now traverse
all edges and access the distances.

TEdgeIterator it(g);

for(;!atEnd(it);goNext(it)) {

std::cout << getProperty(distanceMap, *it) << std::endl;

}

Graph algorithms. |

The above figure gives an overview of the graph algorithms currently available in SeqAn. The biological algorithms
use heavily the alignment graph, all others use the appropriate standard graph. All algorithms require some kind
of additional input, e.g., the Dijkstra algorithm requires a distance property map, alignment algorithms sequences and a score
type and the network flow algorithm capacities on the edges. The basic usage can be found in the documentation but
a simple example is given here. The following code snippet aligns two sequences.

typedef StringSet<TString, Dependent<> > TStringSet;

typedef Graph<Alignment<TStringSet, void> > TGraph;

TStringSet str;

TString str0("annual");

appendValue(str, str0);

TString str1("annealing");

appendValue(str, str1);

TGraph g(str);

Score<int> score_type = Score<int>(0,-1,-1,0);

int score = globalAlignment(g, score_type, NeedlemanWunsch() );

::std::cout << g << ::std::endl;

We first assign the sequences annual and annealing to a string set.
Based on this string set an alignment graph is created. A score object is
used to configure the Needleman Wunsch algorithm and the globalAlignment call
computes the alignment.

SeqAn - Sequence Analysis Library - www.seqan.de