Class SequenceStream
High-level reading and writing of sequences.

Defined in <seqan/seq_io.h>
Signature class SequenceStream;

Member Function Overview

Interface Function Overview

Detailed Description

Building upon the more low-level sequence I/O functionality of SeqAn, this class provides easier to use I/O facilities. Especially, the underlying stream layer and using RecordReaders is hidden from the user. This is achieved by using dynamic polymorphism which comes at some performance cost.

Also see the Simple Sequence I/O Tutorial.

Operation Mode

When reading, there are two operation modes: Normal reading and reading of "persistent" records. When reading in "persistent" mode, SequenceStream will scan over each record twice: once for determining its size and once for actually reading the sequences. After the first pass, we can allocate a buffer of the exact size we need. This can save memory up to a factor of two, at the cost of scanning each record twice. Note that this is only possible for reading uncompressed files.

File Format and File Type

The file type determines whether a file is stored as raw text or whether it is compressed. Examples for file types are text files or gzip compressed files (FILE.gz). The file format considers the contents of the raw/decompressed file. Examples for file formats are FASTA, FASTQ, or EMBL.

When reading, the file type and format are guessed from the file itself. You do not have to specify any but you can force the SequenceStream to use the ones you provide. When writing, you should specify a file type and format when constructing the SequenceStream object. Otherwise, it will default to writing out raw-text FASTA files.

Examples

Read a sequence file "example.fa" record by record. See the documentation of readRecord, readBatch, and readAll for more examples, including record-wise reading, reading in batches, and reading all records in a file.

Reading from SequenceStream

#include <seqan/basic.h>
#include <seqan/seq_io.h>
#include <seqan/sequence.h>

using namespace seqan;

int main(int argc, char ** argv)
{
    CharString path = SEQAN_PATH_TO_ROOT();
    append(path, "/core/demos/seq_io/example.fa");

    // Open file and check for errors.
    SequenceStream seqStream(toCString(path));
    if (!isGood(seqStream))
    {
        std::cerr << "ERROR: Could not open " << path << " for reading.\n";
        return 1;
    }

    // Read from file and print the result to stdout.
    seqan::CharString id, seq;
    while (!atEnd(seqStream))
    {
        if (readRecord(id, seq, seqStream) != 0)
        {
            std::cerr << "Problem reading from " << path << "\n";
            return 1;
        }
        std::cout << id << "\t" << seq << "\n";
    }

    return 0;
}

The output is as follows:

chr	CCTATCTAATAATATACCTTATACTGGACTAGTGCCAATATTAAAATGAAGTGGGCGTAGTGTGTAATTTGATTGGGTGGAGGTGTGGCTTTGGCGTGCTTGTAAGTTTGGGCGGATGAGGAAGTGGGGCGCGGCGTGGGAGCCGGGCGCGCCGGATGTGACGTTTTAGACGCCATTTTACACGGAAATGATGTTTTTTGGGCGTTGTTTGTGCAAATTTTGTGTTTTAGGCGCGAAAACTGAAATGCGGAAGTGAAAATTGATGACGGCAATTTTATTATAGGCGCGGAATATTTACCGAGGGCAGAGTGAACTCTGAGCCTCTACGTGTGGGTTTCGATACGTGAGCGACGGGGAAACTCCACGTTGGCGCTCAAAGGGCGCGTTTATTGTTCTGTCAGCTGATCGTTTGGGTATTTAATGCCGCCGTGTTCGTCAAGAGGCCACTCTTGAGTGCCAGCGAGAAGAGTTTTCTCTGCCAGCTCATTTTCACGGCGCCATTATGAGAACTGAAATGACTCCCTTGGTCCTGTCGTATCAGGAAGCTGACGACATATTGGAGCATTTGGTGGACAACTTTTTTAACGAGGTACCCAGTGATGATGATCTTTATGTTCCGTCTCTTTACGAACTGTATGATCTTGATGTGGAGTCTGCCGGTGAAGATAATAATGAACAGGCGGTGAATGAGTTTTTTCCCGAATCGCTTATTTTAGCTGCCAGTGAGGGGTTGTTTTTACCGGAGCCTCCTGTACTTTCTCCTGTCTGTGAGCCTATTGGGGGCGAATGTATGCCACAACTGCACCCTGAAGATATGGATTTATTGTGCTACGAGATGGGCTTTCCCTGTAGCGATTCGGAAGACGAGCAAGACGAGAACGGAATGGCGCATGTTTCTGCATCCGCAGCTGCTGCTGCCGCTGATAGGGAACGTGAGGAGTTTCAGTTAGACCATCCAGAGTTGCCCGGACACAATTGTAAGTCCTGTGAGCACCACCGGAATAGTACTGGAAATACTGACTTAATGTGCTCTTTGTGCTATCTGCGAGCCTACAACATGTTCATTTACAGTAAGTGTGCTATGGGAGGTGGGAGGTGATTTTTTTTTCTTAAGCAGTGAAAAATAATATTTTGTTGTTTTTAGGTCCTGTTTCCGATAATGAGCCTGAACCTAATAGCACTTTGGATGGCGATGAGCGACCCTCACCCCCGAAACTAGGAAGTGCGGTTCCAGAAGGAGTAATAAAACCTGTGCCTCAGCGGGTGACTGGGAGGCGTAGATGTGCTGTGGAAAGCATTTTGGATTTGATTCAAGAGGAAGAAAGAGAACAAACAGTGCCTGTTGATCTGTCAGTGAAACGCCCTAGATGTAATTAATGGACTTTGAGCACCTGGGCAATAAAATAGGGGTAATGTGGTTTTTGTGAGTCATGTATAATAAAACTGGTTTCGGTTGAAGTGTCTTGTTAATGTTTGTTTGGGCGTGGTTAAACAGGGATATAAAGCTGGGTTGGTGTTGCTTTGAATAGTTCATCTTAGT

Writing to SequenceStream

#include <seqan/basic.h>
#include <seqan/seq_io.h>
#include <seqan/sequence.h>

using namespace seqan;

int main(int argc, char ** argv)
{
    CharString path = SEQAN_TEMP_FILENAME();
    append(path, ".fa");

    // Open file and check for errors.
    SequenceStream seqStream(toCString(path), SequenceStream::WRITE);
    if (!isGood(seqStream))
    {
        std::cerr << "ERROR: Could not open " << path << " for writing.\n";
        return 1;
    }

    // Write two sequences to the file.
    if (writeRecord(seqStream, "one", "CGAT") != 0 ||
        writeRecord(seqStream, "two", "ASDF") != 0)
    {
        std::cerr << "ERROR: Problem writing to " << path << "\n";
        return 1;
    }

    return 0;
}

See Also

Member Functions Detail

SequenceStream::SequenceStream(); SequenceStream::SequenceStream(fileName[, operationMode[, format[, fileType]]]);

Constructor

Parameters

fileName Path to the file to open. Type: char const *
operationMode Mode to open the file in. Optional. Type: SequenceStream::OperationMode. Default: READ
format Mode to open the file in. Optional. Type: SequenceStream::FileFormat. Default: AUTO_FORMAT.
fileType Mode to open the file in. Optional. Type: SequenceStream::FileType. Default: AUTO_TYPE.

Interface Functions Detail

bool isGood(seqStream);

Check whether a SequenceStream is at the end of the file.

Parameters

seqStream The SequenceStream object to read from. Type: SequenceStream

Returns

bool true if the SequenceStream is at the end of the file and false otherwise.

See Also

void close(seqStream);

Close a SequenceStream.

Parameters

seqStream The SequenceStream to close.

void flush(seqStream);

Write all remaining data from a SequenceStream to disk.

Parameters

seqStream The SequenceStream to flush.

bool isGood(seqStream);

Check whether a SequenceStream is ready for reading or writing.

Parameters

seqStream The SequenceStream to query.

Returns

bool true if the stream is ready and false otherwise.

void open(seqStream, fileName[, operationMode[, format[, fileType]]]);

Open or re-open a file using a SequenceStream.

Parameters

seqStream The SequenceStream object to open. Types: SequenceStream.
fileType Mode to open the file in. Optional. Types: SequenceStream::FileType. Default: AUTO_TYPE.
format Mode to open the file in. Optional. Types: SequenceStream::FileFormat. Default: AUTO_FORMAT.
operationMode Mode to open the file in. Optional. Types: SequenceStream::OperationMode. Default: READ.
fileName Path to the file to open. Types: char const *

int readAll(ids, seqs[, quals], seqStream);

Read all sequence records from a SequenceStream object.

Parameters

ids The identifiers of the sequence are written here. Types: StringSet of CharString.
seqs The sequence of the record is written here. Types: StringSet
quals The qualities of the sequence is written here. Optional. If the sequences have no qualities, as in FASTA files, the StringSet will contain empty strings. Type: StringSet of CharString
seqStream The SequenceStream object to read from. Type: SequenceStream

Returns

int 0 on success, non-0 value on success.

Examples

Read the sequences of a FASTA file.

int main()
{
    seqan::SequenceStream seqIO("in.fasta", seqan::SequenceStream::READ_ALL);
    seqan::StringSet<seqan::CharString> ids;
    seqan::StringSet<seqan::Dna5String> seqs;
 
    int res = readAll(ids, seqs, seqIO);
    if (res != 0)
    {
        std::cerr << "ERROR: Could not read records!\n";
        return 1;
    }
 
    return 0;
}

int readBatch(ids, seqs[, quals], seqStream, num);

Read a given number of sequence records from SequenceStream.

Parameters

ids The identifiers of the sequence are written here. Type: StringSet of CharString.
seqs The sequence of the record is written here. Type: StringSet.
quals The qualities of the sequence is written here. Optional. If the sequences have no qualities, as in FASTA files, the StringSet will contain empty strings. Type: StringSet of CharString
seqIO The SequenceStream object to read from.

Returns

int 0 on success, non-0 value on errors.

Examples

Read the first sequences of a FASTA file, up to ten.

int main()
{
    seqan::SequenceStream seqIO("in.fasta", seqan::SequenceStream::READ_BATCH);
    seqan::StringSet<seqan::CharString> ids;
    seqan::StringSet<seqan::Dna5String> seqs;
 
    int res = readBatch(ids, seqs, seqIO, 10);
    if (res != 0)
    {
        std::cerr << "ERROR: Could not read records!\n";
        return 1;
    }
 
    return 0;
}

int readRecord(id, seq[, quals], seqStream);

Read the next sequence record from SequenceStream.

Parameters

id The identifier of the sequence is written here. Types: CharString
seq The sequence of the record is written here. Types: String
quals The qualities of the sequence is written here. Optional. If the sequence has no qualities, clear is called on quals to indicate this. Type: CharString
seqIO The SequenceStream object to read from. Type: SequenceStream

Returns

int 0 on success, non-0 on error.

Examples

Read the first sequence of a FASTA file.

int main()
{
    seqan::SequenceStream seqIO("in.fasta", seqan::SequenceStream::READ_SINGLE);
    seqan::CharString id;
    seqan::Dna5String seq;
 
    if (atEnd(seqIO))
    {
        std::cerr << "ERROR: File does not contain any sequences!\n";
        return 1;
    }
    int res = readRecord(id, seq, seqIO);
    if (res != 0)
    {
        std::cerr << "ERROR: Could not read first record!\n";
        return 1;
    }
 
    return 0;
}

int writeAll(seqStream, ids, seqs[, options]); int writeAll(seqStream, ids, seqs, quals[, options]);

Write sequence records from to a SequenceStream object.

Parameters

seqStream The SequenceStream object to write to. Types: SequenceStream
ids Identifiers to write out. Type: StringSet of CharString.
seqs Sequences to write out. Type: StringSet.
quals Qualities to write out. Optional. Qualities are ignored if the file format does not support them. If none are given for FASTQ, score 40 is written out for all. Typex: StringSet of CharString.
options The configuration for writing FASTA and FASTQ files. Type: SequenceOutputOptions

Returns

int 0 on success, non-0 value on errors.

The records are appended to the file if you have written out any previously. When writing out Dna5Q, qualities are automatically taken from the sequence characters.

Examples

Write out all sequences.

int main()
{
    seqan::SequenceStream seqIO("in.fasta", seqan::SequenceStream::WRITE);
    seqan::StringSet<seqan::CharString> ids;
    appendValue(ids, "seq1");
    appendValue(ids, "seq2");
    seqan::StringSet<seqan::Dna5String> seqs;
    appendValue(seqs, "CGAT");
    appendValue(seqs, "TTTT");
 
    int res = writeAll(seqIO, ids, seqs);
    if (res != 0)
    {
        std::cerr << "ERROR: Could not write records!\n";
        return 1;
    }
 
    return 0;
}

int writeRecord(seqStream, id, seq[, options]); int writeRecord(seqStream, id, seq, quals[, options]);

Write one sequence record from to a SequenceStream object.

Parameters

seqStream The SequenceStream object to write to. Type: SequenceStream
quals The qualities to write out.
id The identifier to write. Type: CharString
seq The sequence to write. Type: String
options The configuration for writing FASTA and FASTQ files. Type: SequenceOutputOptions

Returns

int 0 on success, non-0 value on errors.

The record is appended to the file if you have written out any previously. When writing out Dna5, qualities are automatically taken from the sequence characters.

Examples

Write out two sequences to a FASTQ file.

int main()
{
    seqan::SequenceStream seqIO("in.fasta", seqan::SequenceStream::WRITE);
    seqan::StringSet<seqan::CharString> ids;
    appendValue(ids, "seq1");
    appendValue(ids, "seq2");
    seqan::StringSet<seqan::Dna5String> seqs;
    appendValue(seqs, "CGAT");
    appendValue(seqs, "TTTT");
 
    for (unsigned i = 0; i < length(ids); ++i)
    {
        int res = writeRecord(seqIO, ids[0], seqs[0]);
        if (res != 0)
        {
            std::cerr << "ERROR: Could not write records!\n";
            return 1;
        }
    }
 
    return 0;
}