Graphs

Graphs in SeqAn.

Contents

1 Graphs

1.1 Directed graph

4 Graph property maps

5 Graph algorithms

Graphs can be used to model a great variety of problems and hence, they are in widespread use as a problem-solving tool. Directed and undirected graphs, trees, automatons, and alignment graphs are different types of graphs with distinct features. In SeqAn, all graph types are specializations of Graph. Graphs can be traversed using iterators. Besides the fundamental graph data structures, SeqAn provides the most well-known graph algorithms, e.g. Dijkstra's shortest path algorithm. The fundamental graph algorithms are supplemented with some specialized alignment algorithms. All graphs can be exported in dot format for visualization.

1 Graphs

1.1 Directed graph

A directed graph.

A Directed graph has an optional cargo template argument. A cargo is some kind of object attached to the edges of the graph. A typical example are distances in flight network graphs. Cargos and edges are handled as one unit: If you add an edge you have to provide a cargo, if you remove an edge you also remove the cargo. If you do not want to use cargos, you can leave out the TCargo parameter or you can use void, which is the default. All edges are directed that is, they have a distinct source and target vertex.

Graph<Directed<TCargo> > directedGraph;

TCargo is any kind of data type, e.g. a double.

1.2 Undirected graph

An undirected graph with multiple components.

An Undirected graph has, of course, undirected edges. In all other respects, it can be used in the same way as a directed graph.

Graph<Undirected<void> > undirectedGraph;

In this example, the edges have no cargo.

1.3 Other graphs

Alignment graph, tree, and trie.

SeqAn is a library for sequence analysis and hence, it makes use of graphs as helper classes for string matching, alignment algorithms or phylogenetic trees. Because of that, SeqAn provides ,besides directed and undirected graphs, specialized graphs like a Tree, an Automaton, a Trie or an Alignment Graph.

2 Graph functions

All graphs support a set of common functions, including numEdges, numVertices, addVertex, removeVertex, addEdge etc. The functions are listed and explained in the Graph section. Besides this core set of functions, some graphs have specialized methods, e.g., trees support childVertex, addChild and the like. See the documentation of these graphs for details. The following code snippet illustrates the use of some of these functions to create a directed graph.

typedef Graph<Directed<> > TGraph;
typedef VertexDescriptor<TGraph>::Type TVertexDescriptor;
typedef EdgeDescriptor<TGraph>::Type TEdgeDescriptor;
TGraph g;
TVertexDescriptor v0 = addVertex(g);
TVertexDescriptor v1 = addVertex(g);
TEdgeDescriptor e1 =addEdge(g,v0,v1);
TVertexDescriptor v2 = addVertex(g);
TVertexDescriptor v3 = addVertex(g);
TVertexDescriptor v4 = addVertex(g);
addEdge(g,v3,v4);
addEdge(g,v3,v2);
addEdge(g,v2,v2);
addEdge(g,v3,v0);
addEdge(g,v3,v1);
addEdge(g,v4,v3);
::std::cout << g << ::std::endl;
removeVertex(g, v2);
::std::cout << g << ::std::endl;

As you can see in this example, all vertices are represented by vertex descriptors and all edges are represented by edge descriptors. Both types can be accessed by the corresponding metafunction, namely VertexDescriptor and EdgeDescriptor. Vertex and edge descriptors are the value types of all iterators working on graphs.

3 Graph iterators

Iterators are used to traverse a graph. The value type of vertex and edge iterators are always vertex or edge descriptors, respectively. These descriptors are a kind of handle to deal with vertices and edges, e.g. an edge descriptor can be used to access the cargo of an edge. Vertex Iterator, Adjacency Iterator, Bfs Iterator and Dfs Preorder Iterator are the basic vertex iterators. The following code snippet shows how to use the adjacency iterator on vertex 3 to traverse all adjacent neighbors of that vertex.

typedef Iterator<TGraph, AdjacencyIterator>::Type TAdjacencyIterator;
TAdjacencyIterator it(g,3);
for(;!atEnd(it);goNext(it)) {
std::cout << *it << std::endl;
}

Notice how the metafunction Iterator is used to get the correct iterator type for TGraph. We simply output all adjacent vertices. Besides vertex iterators, SeqAn provides a simple Edge Iterator and an Out-Edge Iterator.

typedef Iterator<TGraph, OutEdgeIterator>::Type TOutEdgeIterator;
TOutEdgeIterator it(g, v0);
for(;!atEnd(it);goNext(it)) {
std::cout << cargo(*it) << std::endl;
}

In this example, we traverse all out-edges of vertex 0 and print their cargos.

4 Graph property maps

Property maps are a fundamental abstraction mechanism to attach auxiliary information to the vertices and edges of a graph. A typical example are the city names and flight distances of a flight network graph. In most scenarios, users should use an external property map to attach this information to the graph but other mechanisms do exist (e.g., the possibility to use a Cargo). Be aware that the word external is a hint that the information is stored independent of the graph and functions like removeVertex do not affect the property map. Property maps are indexed via the already well-known vertex and edge descriptors. This is quite handy since we can use, for instance, a vertex iterator to traverse the graph and access the properties on the fly. Most graph algorithms make heavily use of this concept and therefore it is illustrated below. First, let us create a simple graph.

Second, we have to create and resize an edge property map so that it can hold information on all edges.

String<unsigned int> distanceMap;
resizeEdgeMap(g , distanceMap);

As you can see, property maps are simply strings of some type. In this case, we use unsigned integer to store the distances for our graph g. For each edge we can store a distance now.

assignProperty(distanceMap, e1, 10);

Note how the edge descriptor is used to access the property map. An algorithm can now traverse all edges and access the distances.

typedef Iterator<TGraph, EdgeIterator>::Type TEdgeIterator;
TEdgeIterator it(g);
for(;!atEnd(it);goNext(it)) {
std::cout << getProperty(distanceMap, *it) << std::endl;
}

5 Graph algorithms

Graph algorithms.

The above figure gives an overview of the graph algorithms currently available in SeqAn. The biological algorithms use heavily the alignment graph, all others use the appropriate standard graph. All algorithms require some kind of additional input, e.g., the Dijkstra algorithm requires a distance property map, alignment algorithms sequences and a score type and the network flow algorithm capacities on the edges. The basic usage can be found in the documentation but a simple example is given here. The following code snippet aligns two sequences.

typedef String<char> TString;
typedef StringSet<TString, Dependent<> > TStringSet;
typedef Graph<Alignment<TStringSet, void> > TGraph;

TStringSet str;
TString str0("annual");
appendValue(str, str0);
TString str1("annealing");
appendValue(str, str1);
TGraph g(str);
Score<int> score_type = Score<int>(0,-1,-1,0);
int score = globalAlignment(g, score_type, NeedlemanWunsch() );
::std::cout << g << ::std::endl;

We first assign the sequences annual and annealing to a string set. Based on this string set an alignment graph is created. A score object is used to configure the Needleman Wunsch algorithm and the globalAlignment call computes the alignment.

SeqAn - Sequence Analysis Library - www.seqan.de