SeqAn supports the input and output of files in different file formats. The most simple file format is
Raw that is used to load a file "as is" into a string or vice versa, e.g.:
fstrm.open("input.txt", ios_base::in | ios_base::binary);
read(fstrm, str, Raw());
FILE * cstrm = fopen("output.txt", "w");
write(cstrm, str, Raw());
In this example, the tag
Raw() can also be omitted, since Raw is the default file format.
Instead of using read and write to read and write raw data, one can also use the operators << and >>.
Files can either be instances of a standard stream classes, or a C-style stream (i.e.
FILE *), or a SeqAn File object (see below).
Note that the files should always be opened in binary mode.
Raw, SeqAn offers other file formats especially for bioinformatics, line Fasta, EMBL, or Genbank.
These file formats consist of one or more data records.
For loading all records repeat calling read, for example:
fstrm.open("ests.fa", ios_base::in | ios_base::binary);
while (! strm.eof())
read(fstrm, est, Fasta());
//use sequence data in est
The function goNext skips the current record and proceeds to the next record.
Each record contains a piece of data (i.e. a sequence or an alignment) and optional some additional metadata. One can load these metadata before (not after) loading the actual data using readMeta. The function fills a string with the unparsed metadata.
goNext(cstrm, Embl()); //skip first data record
read(cstrm, dna_sequence, Embl()); //reads second record
readMeta(cstrm, meta_data, Embl()); //reads meta data of third record
read(cstrm, dna_sequence, Embl()); //reads third record
write is used to write a record into a file. Depending on the file format, a suitable metadata string must be passed to write.
Example: The following example program:
write(cstrm, "acgt", "the metadata", Fasta());
creates the following file "genomic_data.fa":
The easiest way for a read-only access of sequence data stored in a file is a file reader string. A file reader string implements the container concept, i.e. it implements common functions like length or begin. It has minimal memory consumption, because each part of the sequence data is loaded not before it is needed.
cout << length(fr); //prints length of the sequence
The constructor of the file reader string can also take a file from which the sequences will be loaded. For example, the following code will read the second sequence in the file:
readMeta(cstrm, meta_data, Embl()); //reads meta data of second record
String<Dna, FileReader<Embl> > fr(cstrm); //reads sequence data of second record
SeqAn - Sequence Analysis Library - www.seqan.de