Graphs
Graphs in SeqAn.
Graphs can be used to model a great variety of problems and hence, they are in widespread use as a problem-solving tool. Directed and undirected graphs, trees, automatons, and alignment graphs are different types of graphs with distinct features. In SeqAn, all graph types are specializations of Graph. Graphs can be traversed using iterators. Besides the fundamental graph data structures, SeqAn provides the most well-known graph algorithms, e.g. Dijkstra's shortest path algorithm. The fundamental graph algorithms are supplemented with some specialized alignment algorithms. All graphs can be exported in dot format for visualization.
1 Directed Graph
A directed graph.
A Directed graph expects 2 optional arguments, a cargo and a specification. Cargos are attached to edges in the graph. A typical example are distances for flight networks represented as a graph. The basic principle is simple: If you add an edge you have to provide a cargo, if you remove an edge you also remove the cargo. If you do not want to use cargos, you can leave out the TCargo parameter or you can use void, which is the default. All edges are directed that is, they have a distinct source and target vertex.
Graph<Directed<TCargo> > directedGraph;
TCargo is any kind of data type, e.g. a double.
2 Undirected Graph
An undirected graph with multiple components.
An Undirected graph has, of course, undirected edges but in all other respects, it can be used in the same way as a directed graph.
Graph<Undirected<void> > undirectedGraph;
In this example, the edges have no cargo.
3 Other graphs
Alignment graph, tree, and trie.
SeqAn is not a new library for graph data structures and algorithms. It focusses on sequence analysis. Hence we introduced graphs as helper classes for string matching or alignment algorithms. Besides directed and undirected graphs, those algorithms make use of special graphs like a Tree, an Automaton, a Trie or an Alignment Graph. These graphs are therefore predefined in SeqAn.
4 Graph functions
All graphs support a couple of common functions, including numEdges, numVertices, addVertex, removeVertex, addEdge etc. The functions are listed and explained in the Graph section but the function names should be self-explanatory. Besides this core set of function, some graphs have special methods, e.g., trees support childVertex, addChild, and the like. See the documentation of these graphs for details. The following code snippet illustrates the use of these functions to create the directed graph in the first figure.
Graph<Directed<> > g;
TVertexDescriptor v0 = addVertex(g);
TVertexDescriptor v1 = addVertex(g);
TEdgeDescriptor e1 =addEdge(g,v0,v1);
TVertexDescriptor v2 = addVertex(g);
TVertexDescriptor v3 = addVertex(g);
TVertexDescriptor v4 = addVertex(g);
addEdge(g,v3,v4);
addEdge(g,v3,v0);
addEdge(g,v3,v1);
addEdge(g,v4,v3);
removeVertex(g, v2);
As you can see in this example, all vertices are represented by vertex descriptors and all edges are represented by edge descriptors. Both types can be accessed by the corresponding metafunction, namely VertexDescriptor and EdgeDescriptor. Vertex and edge descriptors are the value types of all iterators working on graphs.
5 Graph iterators
Iterators are used to traverse a graph. The value of vertex and edge iterators always returns a vertex or edge descriptor, respectively. These descriptors are a kind of handle to deal with vertices and edges, e.g. an edge descriptor can be used to access the cargo of an edge. Vertex Iterator, Adjacency Iterator, Bfs Iterator and Dfs Preorder Iterator are the basic vertex iterators. Exemplary, the following code snippet shows how the adjacency iterator is initialized on vertex 3 and then traverses all the adjacent neighbors.
typedef typename Iterator<TGraph, AdjacencyIterator>::Type TAdjacencyIterator;
TAdjacencyIterator it(g,3);
for(;!atEnd(it);goNext(it)) {
    std::cout << *it << std::endl;
}
In this example, you can see how the metafunction Iterator is used to get the correct iterator type for TGraph. We simply output all adjacent vertices. In respect to edge iterators, SeqAn provides a simple Edge Iterator and an Out-Edge Iterator. The use of the latter is showed in the next figure.
typedef typename Iterator<TGraph, OutEdgeIterator>::Type TOutEdgeIterator;
TOutEdgeIterator it(g, v0);
for(;!atEnd(it);goNext(it)) {
    std::cout << cargo(*it) << std::endl;
}
In this example, we traverse all out-edges of vertex 0 and print their cargo.
6 Graph property maps
Property maps are a fundamental abstraction mechanism to attach auxiliary information to the vertices and edges of a graph. A typical example are the city names and flight distances of a graph representing a flight network. In most scenarios, users should use an external property map to attach this information to the graph but other mechanisms do exist (e.g., the possibility to use a Cargo). Be aware that the word external is a hint that the information is stored independent of the graph and functions like removeVertex do not affect the property map. Property maps are indexed via the already well-known vertex and edge descriptors. This is quite handy since we can use, for instance, a vertex iterator to traverse the graph and access the properties on the fly. Most graph algorithms make heavily use of this concept and therefore it is illustrated below. First, let us create a simple graph.
Graph<Directed<> > g;
TVertexDescriptor v0 = addVertex(g);
TVertexDescriptor v1 = addVertex(g);
TEdgeDescriptor e1 =addEdge(g,v0,v1);
Second, we have to create and resize an edge property map so that it can hold information on all edges.
String<unsigned int> distanceMap;
resizeEdgeMap(g , distanceMap);
As you can see, property maps are simply strings of some type. In this case, we use unsigned integer to store the distances for our graph g. For each edge we can store a distance now.
assignProperty(distanceMap, e1, 10);
Note how the edge descriptor is used to access the property map. An algorithm can now traverse all edges and access the distances.
typedef typename Iterator<Graph<Directed<> >, EdgeIterator>::Type TEdgeIterator;
TEdgeIterator it(g);
for(;!atEnd(it);goNext(it)) {
    std::cout << getProperty(distanceMap, *it) << std::endl;
}
7 Graph algorithms
Graph algorithms.
The above figure gives an overview of the graph algorithms currently available in SeqAn. The biological algorithms use heavily the alignment graph, all others use the appropriate standard graph. All algorithm require some kind of additional input, e.g., the Dijkstra algorithm requires a distance map, alignment algorithms sequences and a score type, or the network flow algorithm capacities on edges. The basic usage can be found in the documentation but a simple example is given here. The following code snippet aligns two sequences.
typedef String<char> TString;
typedef StringSet<TString, Dependent<> > TStringSet;
typedef Graph<Alignment<TStringSet, void> > TGraph;
    
TStringSet str;
TString str0("annual");    
assignValueById(str, str0);
TString str1("annealing"); 
assignValueById(str, str1);
TGraph g(str);
Score<int> score_type = Score<int>(0,-1,-1,0);
int score = globalAlignment(g, score_type, NeedlemanWunsch() );
We first assign the sequences annual and annealing to a string set. Based on this string set an alignment graph is created. A score object is used to configure the Needleman Wunsch algorithm and the globalAlignment call creates the alignment.
SeqAn - Sequence Analysis Library - www.seqan.de