Spec BlastTabularFileIn
FormattedFileIn abstraction for BlastTabular

Extends FormattedFileIn
All Extended FormattedFile, FormattedFileIn
Defined in <seqan/blast.h>
Signature template <typename TBlastIOContext> using BlastTabularFileIn = FormattedFile<BlastTabular, Input, TBlastIOContext>;

Member Function Overview

Member Functions Inherited From FormattedFile

Interface Function Overview

Interface Functions Inherited From FormattedFile

Interface Functions Inherited From FormattedFileIn

Interface Metafunction Overview

Interface Metafunctions Inherited From FormattedFile

Detailed Description

This is a FormattedFile specialization for reading BlastTabular formats. For details on how to influence the reading of files and how to differentiate between the tabular format without comment lines and the one with comment lines, see BlastIOContext. Please note that you have specify the type of the context as a template parameter to BlastTabularFileIn.

Overview

For a detailed example have a look at the Blast IO tutorial.

See Also

Interface Functions Detail

bool onRecord(blastTabularIn);

Returns whether the currently buffered line looks like the start of a record.

Parameters

blastTabularIn A BlastTabularFileIn formattedFile.

Returns

bool true or false

Thrown Exceptions

IOError On low-level I/O errors.
ParseError On high-level file format errors.

Data Races

Thread safety unknown!

void readFooter(blastTabularIn);

Read the footer (bottom-most section) of a BlastTabular file.

Parameters

blastTabularIn A BlastTabularFileIn formattedFile.

Thrown Exceptions

IOError On low-level I/O errors.
ParseError On high-level file format errors.

Data Races

Thread safety unknown!

void readHeader(blastTabularIn);

Read the header (top-most section) of a BlastTabular file.

Parameters

blastTabularIn A BlastTabularFileIn formattedFile.

Thrown Exceptions

IOError On low-level I/O errors.
ParseError On high-level file format errors.

Data Races

Thread safety unknown!

void readRecord(blastRecord, blastTabularIn);

Read a record from a file in BlastTabular format.

Parameters

blastRecord A BlastRecord to hold all information related to one query sequence.
blastTabularIn A BlastTabularFileIn formattedFile.

Remarks

This function will read an entire record from a blast tabular file, i.e. it will read the comment lines (if the format is COMMENTS) and 0-n BlastMatches belonging to one query.

Please note that if there are no comment lines in the file the boundary between records is inferred from the indentity of the first field, i.e. non-standard field configurations must also have Q_SEQ_ID as their first BlastMatchField.

Comment lines

The qId member of the record is read from the comment lines and the matches are resized to the expected number of matches succeeding the comments.

This function also sets many properties of blastTabularIn's BlastIOContext, including these members:

  • versionString: version string of the program.
  • dbName: name of the database.
  • fields: descriptors for the columns.
  • fieldsAsStrings: labels of the columns as they appear in the file.
  • conformancyErrors: if this StringSet is not empty, then there are issues in the comments.
  • otherLines: any lines that cannot be interpreted; these always also imply conformancyErrors.
  • legacyFormat: whether the record (and likely the entire file) is in legacyFormat.
  • It also sets the blast program run-time parameter of the context depending on the information found in the comments. If the compile time parameter was set on the context and they are different this will result in a critical error.

    Please note that for legacyFormat the fields member is always ignored, however fieldsAsStrings is still read from the comments, in case you want to process it.

    In case you do not wish the fields to be read from the comments, you can set context.ignoreFieldsInComments to true. This will be prevent it from being read and will allow you to specify it manually which might be relevant for reading the match lines.

    If the format is NO_COMMENTS none of the above happens and qId is derived from the first match.

    Matches

    A match line contains 1 - n columns or fields, 12 by default. The fields member of the context is considered when reading these fields. It is usually extracted from the comment lines but can also be set by yourself if there are no comments or if you want to overwrite the comments' information (see above). You may specify less fields than are actually present, in this case the additional fields will be discarded. The parameter is ignored if legacyFormat is set.

    To differentiate between members of a BlastMatch that were read from the file and those that have not been set (e.g. both could be 0), the latter are initialized to their respective max-values.

    Please note that the only transformations made to the data are the following:

  • computation of the number of identities (from the percentage) [default]
  • computation of the number of positives (from the percentage) [if given]
  • number of gaps computed from other values [default]
  • In contrast to writeRecord no other transformations are made, e.g. the positions are still one-indexed and flipped for reverse strand matches. This is due to the required fields for retransformation (sequence lengths, frames) not being available in the default columns.

    Thrown Exceptions

    IOError On low-level I/O errors.
    ParseError On high-level file format errors.

    Data Races

    Thread safety unknown!