Class BlastTabularLL
Low-Level support for Blast Tabular file formats

Defined in <seqan/blast.h>
Signature typedef Tag<BlastTabularLL_> BlastTabularLL;

Interface Function Overview

Detailed Description

There are three blast format related tags in SeqAn:

  • BlastReport with the FormattedFile output specialization BlastReportFileOut
  • BlastTabular with the FormattedFile output and input specializations BlastTabularFileOut and BlastTabularFileIn
  • BlastTabularLL which provides light-weight, but very basic tabular IO
  • This is the third tag, it offers low-level support for reading and writing NCBI Blast compatible tabular files, without comment lines -- although files with comment lines can be read if the comment lines are skipped. These are the formats that are available in legacy Blast (blastall executable) with the parameters -m 8 and -m 9 (with comment lines) and in BLAST+ (blastx, blastn...) with the parameters -outfmt 6 and -outfmt 7 respectively.

    For most situations BlastTabular is more adequate. Use this tag's interface only for quick parsing of matches in a file, e.g counting and filtering purposes. This interface does not offer a FormattedFile abstraction and no convenience data structures, it does no transformations on the data.

    The reference Blast implementation used for developing the SeqAn support is NCBI Blast+ 2.2.26 and NCBI Blast 2.2.26 for the legacy support.

    Input example

    The following example program extracts the list of matching query-subject-pairs from a blast tabular file and prints it to std::out:

    #include <iostream>
    #include <seqan/basic.h>
    #ifndef STDLIB_VS
    #include <seqan/blast.h>
    
    using namespace seqan;
    
    int main()
    {
        std::string inPath = std::string(SEQAN_PATH_TO_ROOT()) + "/tests/blast/plus_comments_defaults.m9";
    
        std::ifstream fin(toCString(inPath), std::ios_base::in | std::ios_base::binary);
        auto fit = directionIterator(fin, Input());
    
        typedef std::pair<std::string, std::string> THsp;
        std::vector<THsp> hsps;
    
        while (!atEnd(fit))
        {
            // skip any comment lines
            if (!onMatch(fit, BlastTabularLL()))
            {
                skipUntilMatch(fit, BlastTabularLL());
                if (atEnd(fit))
                    break;
            }
    
            // resize output list
            resize(hsps, length(hsps)+1);
    
            // read only the first two fields into our variables
            readMatch(fit, BlastTabularLL(), back(hsps).first, back(hsps).second);
        }
    
        std::sort(std::begin(hsps), std::end(hsps));
        std::unique(std::begin(hsps), std::end(hsps));
    
        for (THsp const & hsp : hsps)
            std::cout << '(' << hsp.first << ", " << hsp.second << ")\n";
    
        return 0;
    }
    #else
    int main()
    {
        std::cerr << "Demo not run, because of a bug in Microsoft Visual Studio 2015.\n";
        return 0;
    }
    #endif
    

    The output looks like this:

    (SHAA004TF, sp|P03831|INBD_SHIDY)
    (SHAA004TF, sp|P0A915|OMPW_ECOLI)
    (SHAA004TF, sp|P0A916|OMPW_SHIFL)
    (SHAA004TF, sp|P0CF25|INSB1_ECOLI)
    (SHAA004TF, sp|P0CF26|INSB2_ECOLI)
    (SHAA004TF, sp|P0CF27|INSB3_ECOLI)
    (SHAA004TF, sp|P0CF28|INSB5_ECOLI)
    (SHAA004TF, sp|P0CF29|INSB6_ECOLI)
    (SHAA004TF, sp|P0CF30|INSB8_ECOLI)
    (SHAA004TF, sp|P0CF31|INSB_ECOLX)
    (SHAA004TF, sp|P17266|OMPW_VIBCH)
    (SHAA004TF, sp|P19765|INSB_SHIFL)
    (SHAA004TF, sp|P19766|INSB_SHISO)
    (SHAA004TF, sp|P57998|INSB4_ECOLI)
    (SHAA004TF, sp|P59843|INSB_HAEDU)
    (SHAA004TF, sp|Q8Z7E2|OMPW_SALTI)
    (SHAA004TF, sp|Q8ZP50|OMPW_SALTY)
    (SHAA004TR, sp|Q0HGZ8|META_SHESM)
    (SHAA004TR, sp|Q0HTA5|META_SHESR)
    

    Interface Functions Detail

    bool onMatch(stream, blastTabularLL)

    Returns whether the iterator is on the beginning of a match line.

    Parameters

    iter An input iterator over a stream or any fwd-iterator over a string.
    blastTabularLL The BlastTabularLL tag.

    Returns

    bool true or false

    Thrown Exceptions

    IOError On low-level I/O errors.

    Data Races

    Thread safety unknown!

    void readMatch(stream, blastTabularLL, args ...);

    Low-level BlastTabular file reading.

    Parameters

    stream An input iterator over a stream or any fwd-iterator over a string.
    blastTabularLL The BlastTabularLL tag.
    args Arbitrary typed variables able to hold the fields.

    Remarks

    Use this signature only if you do not or cannot use BlastMatches. You can specify any number of arguments that are expected to be able to hold the values in the columns read, i.e. if you pass a double as argument and the value in the column cannot be successfully cast to double, an exception will be thrown. If you want to be on the safe side, you can pass CharStrings and evaluate them in another way.

    You may specify less columns than are available in the file, all but the first n will be discarded.

    No transformations are made on the data, e.g. the positions are still one-indexed and flipped for reverse strand matches.

    See BlastTabularLL for an example of low-level IO.

    Thrown Exceptions

    IOError On low-level I/O errors.
    ParseError On high-level file format errors.

    Data Races

    Thread safety unknown!

    void skipUntilMatch(stream, blastTabularLL);

    Skip arbitrary number of comment lines until the beginning of a match is reached.

    Parameters

    stream An input iterator over a stream or any fwd-iterator over a string.
    blastTabularLL The BlastTabularLL tag.

    Remarks

    This is also part of the low-level IO and not required if you use readRecord. Call this function whenever you are not onMatch, but want to be, e.g. to readMatch.

    Since it is legal for files to end with comment lines, this function does not throw if end-of-file is reached. You need to check that after calling.

    Thrown Exceptions

    IOError On low-level I/O errors.
    ParseError On high-level file format errors.

    Data Races

    Thread safety unknown!

    void writeMatch(stream, blastTabularLL, columns...)

    Low-level file-writing for blast tabular formats

    Parameters

    stream The file to write to (FILE, fstream, OutputStreamConcept ...).
    blastTabularLL The BlastTabularLL tag.
    columns ... Any number of printable parameters.

    Remarks

    This is a very leight-weight alternative to writeRecord. It doesn't require BlastMatches, BlastRecords or the use of FormattedFile. It supports an arbitrary amount of and arbitrary typed columns to be printed.

    Use this only if you do not require comment lines and you are prepared to do all transformations on the data yourself, i.e. this function does none of the match adjustments mentioned in writeRecord.

    Thrown Exceptions

    IOError On low-level I/O errors.

    Data Races

    Thread safety unknown!