Class
BlastTabularLLLow-Level support for Blast Tabular file formats
Defined in | <seqan/blast.h> |
---|---|
Signature |
typedef Tag<BlastTabularLL_> BlastTabularLL;
|
Interface Function Overview
-
bool onMatch(stream, blastTabularLL)
Returns whether the iterator is on the beginning of a match line. -
void readMatch(stream, blastTabularLL, args ...);
Low-level BlastTabular file reading. -
void skipUntilMatch(stream, blastTabularLL);
Skip arbitrary number of comment lines until the beginning of a match is reached. -
void writeMatch(stream, blastTabularLL, columns...)
Low-level file-writing for blast tabular formats
Detailed Description
There are three blast format related tags in SeqAn:
This is the third tag, it offers low-level support for reading and writing NCBI Blast compatible tabular files, without comment lines -- although files with comment lines can be read if the comment lines are skipped. These are the formats that are available in legacy Blast (blastall executable) with the parameters -m 8 and -m 9 (with comment lines) and in BLAST+ (blastx, blastn...) with the parameters -outfmt 6 and -outfmt 7 respectively.
For most situations BlastTabular is more adequate. Use this tag's interface only for quick parsing of matches in a file, e.g counting and filtering purposes. This interface does not offer a FormattedFile abstraction and no convenience data structures, it does no transformations on the data.
The reference Blast implementation used for developing the SeqAn support is NCBI Blast+ 2.2.26 and NCBI Blast 2.2.26 for the legacy support.
Input example
The following example program extracts the list of matching query-subject-pairs from a blast tabular file and prints it to std::out:
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>
using namespace seqan2;
int main()
{
std::string inPath = getAbsolutePath("/tests/blast/plus_comments_defaults.m9");
std::ifstream fin(toCString(inPath), std::ios_base::in | std::ios_base::binary);
auto fit = directionIterator(fin, Input());
typedef std::pair<std::string, std::string> THsp;
std::vector<THsp> hsps;
while (!atEnd(fit))
{
// skip any comment lines
if (!onMatch(fit, BlastTabularLL()))
{
skipUntilMatch(fit, BlastTabularLL());
if (atEnd(fit))
break;
}
// resize output list
resize(hsps, length(hsps)+1);
// read only the first two fields into our variables
readMatch(fit, BlastTabularLL(), back(hsps).first, back(hsps).second);
}
std::sort(std::begin(hsps), std::end(hsps));
auto last = std::unique(std::begin(hsps), std::end(hsps));
hsps.erase(last, hsps.end());
for (THsp const & hsp : hsps)
std::cout << '(' << hsp.first << ", " << hsp.second << ")\n";
return 0;
}
#else
int main()
{
std::cerr << "Demo not run, because of a bug in Microsoft Visual Studio 2015.\n";
return 0;
}
#endif
The output looks like this:
(SHAA004TF, sp|P03831|INBD_SHIDY) (SHAA004TF, sp|P0A915|OMPW_ECOLI) (SHAA004TF, sp|P0A916|OMPW_SHIFL) (SHAA004TF, sp|P0CF25|INSB1_ECOLI) (SHAA004TF, sp|P0CF26|INSB2_ECOLI) (SHAA004TF, sp|P0CF27|INSB3_ECOLI) (SHAA004TF, sp|P0CF28|INSB5_ECOLI) (SHAA004TF, sp|P0CF29|INSB6_ECOLI) (SHAA004TF, sp|P0CF30|INSB8_ECOLI) (SHAA004TF, sp|P0CF31|INSB_ECOLX) (SHAA004TF, sp|P17266|OMPW_VIBCH) (SHAA004TF, sp|P19765|INSB_SHIFL) (SHAA004TF, sp|P19766|INSB_SHISO) (SHAA004TF, sp|P57998|INSB4_ECOLI) (SHAA004TF, sp|P59843|INSB_HAEDU) (SHAA004TF, sp|Q8Z7E2|OMPW_SALTI) (SHAA004TF, sp|Q8ZP50|OMPW_SALTY) (SHAA004TR, sp|Q0HGZ8|META_SHESM) (SHAA004TR, sp|Q0HTA5|META_SHESR)
Interface Functions Detail
bool onMatch(stream, blastTabularLL)
Parameters
iter
|
An input iterator over a stream or any fwd-iterator over a string. |
---|---|
blastTabularLL
|
The BlastTabularLL tag. |
Returns
bool |
true or false |
---|
Thrown Exceptions
IOError |
On low-level I/O errors. |
---|
Data Races
void readMatch(stream, blastTabularLL, args ...);
Parameters
stream
|
An input iterator over a stream or any fwd-iterator over a string. |
---|---|
blastTabularLL
|
The BlastTabularLL tag. |
args
|
Arbitrary typed variables able to hold the fields. |
Remarks
Use this signature only if you do not or cannot use BlastMatches. You can specify any number of arguments that are expected to be able to hold the values in the columns read, i.e. if you pass a double as argument and the value in the column cannot be successfully cast to double, an exception will be thrown. If you want to be on the safe side, you can pass CharStrings and evaluate them in another way.
You may specify less columns than are available in the file, all but the first n will be discarded.
No transformations are made on the data, e.g. the positions are still one-indexed and flipped for reverse strand matches.
See BlastTabularLL for an example of low-level IO.
Thrown Exceptions
IOError |
On low-level I/O errors. |
---|---|
ParseError |
On high-level file format errors. |
Data Races
void skipUntilMatch(stream, blastTabularLL);
Parameters
stream
|
An input iterator over a stream or any fwd-iterator over a string. |
---|---|
blastTabularLL
|
The BlastTabularLL tag. |
Remarks
This is also part of the low-level IO and not required if you use readRecord. Call this function whenever you are not onMatch, but want to be, e.g. to readMatch.
Since it is legal for files to end with comment lines, this function does not throw if end-of-file is reached. You need to check that after calling.
Thrown Exceptions
IOError |
On low-level I/O errors. |
---|---|
ParseError |
On high-level file format errors. |
Data Races
void writeMatch(stream, blastTabularLL, columns...)
Parameters
stream
|
The file to write to (FILE, fstream, OutputStreamConcept ...). |
---|---|
blastTabularLL
|
The BlastTabularLL tag. |
columns
|
... Any number of printable parameters. |
Remarks
This is a very leight-weight alternative to writeRecord. It doesn't require BlastMatches, BlastRecords or the use of FormattedFile. It supports an arbitrary amount of and arbitrary typed columns to be printed.
Use this only if you do not require comment lines and you are prepared to do all transformations on the data yourself, i.e. this function does none of the match adjustments mentioned in writeRecord.
Thrown Exceptions
IOError |
On low-level I/O errors. |
---|