Example Program
Mummy
Simple MUMmer clone.
MUMmer is a tool to search for multiple exact matches (MUMs) between 2 given sequences.
MUMs can be used as a starting point for a multiple genome alignment algorithm.
This example shows how to implement a simple version of MUMer to find multiple exact matches of n sequences (n≥2) in SeqAn.
File "index_mummy.cpp"
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 | |
11 | |
12 | |
13 | |
14 | |
15 | |
16 | |
17 | |
18 | |
19 | |
20 | |
21 | |
22 | |
23 | |
24 | |
25 | |
26 | |
27 | |
28 | |
29 | |
30 | |
31 | |
32 | |
33 | |
34 | |
35 | |
36 | |
37 | |
38 | |
39 | |
40 | |
41 | |
42 | |
43 | |
44 | |
45 | |
46 | |
47 | |
48 | |
49 | |
50 | |
51 | |
52 | |
53 | |
54 | |
55 | |
56 | |
57 | |
58 | |
59 | |
60 | |
61 | |
62 | |
63 | |
64 | |
65 | |
66 | |
67 | |
68 | |
69 | |
70 | |
71 | |
72 | |
73 | |
74 | |
75 | |
76 | |
77 | |
78 | |
79 | |
80 | |
81 | |
82 | |
83 | |
84 | |
85 | |
86 | |
87 | |
88 | |
89 | |
90 | |
91 | |
92 | |
93 | |
94 | |
95 | |
96 | |
97 | |
98 | |
99 | |
100 | |
101 | |
102 | |
103 | |
104 | |
105 | |
106 | |
107 | |
108 | |
109 | |
110 | |
111 | |
112 | |
113 | |
114 | |
115 | |
116 | |
117 | |
118 | |
119 | |
120 | |
121 | |
122 | |
123 | |
124 | |
125 | |
126 | |
127 | |
128 | |
129 | |
130 | |
131 | |
132 | |
133 | |
134 | |
135 | |
136 | |
137 | |
138 | |
139 | |
140 | |
141 |
Output
If you run the tool on 2 sequences it outputs exactly the same matches as MUMmer (called with -mum option), it
only differs in the order of outputted matches. To get matches with increasing positions at the first sequence we
piped the output to sort .
As an example data set we used 3 strains of chlamydia bacterium
(NC_002620.fna,
NC_000117.fna,
NC_007429.fna) and
saved the Fasta files to the demos directory.
weese@tanne:~/seqan/demos$ make index_mummy
weese@tanne:~/seqan/demos$ ./index_mummy -h
***************************************
*** Simple MUM finder ***
*** written by David Weese (c) 2007 ***
***************************************
Usage: mummy [OPTION]... <SEQUENCE FILE> ... <SEQUENCE FILE>
Options:
-e, --extern use external memory (for large datasets)
-l, --minlen set minimum MUM length
if not set, default value is 20
-h, --help print this help
weese@tanne:~/seqan/demos$ ./index_mummy NC*.fna |sort > mums.txt
3159928 bps sequence imported.
weese@tanne:~/seqan/demos$ head mums.txt
1565 323805 2159 48
1646 323886 2240 27
1722 323962 2316 37
1774 324014 2368 26
1941 324181 2535 23
2061 324301 2655 35
2102 324342 2696 29
2132 324372 2726 20
2183 324423 2777 24
weese@tanne:~/seqan/demos$
SeqAn - Sequence Analysis Library - www.seqan.de