Class FaiIndex
Data structure for access to FAI indices.

Defined in <seqan/seq_io.h>
Signature class FaiIndex;

Member Function Overview

Interface Function Overview

Detailed Description

FAI indices allow the rast random access to sequences or parts of sequences in a FASTA file. Originally, they were introduced in the samtools program.

Also see the Indexed FASTA I/O Tutorial.

Example

The following example demonstrates the usage of the FaiIndex class.

#include <seqan/basic.h>
#include <seqan/seq_io.h>
#include <seqan/sequence.h>

using namespace seqan;

int main()
{
    CharString path = SEQAN_PATH_TO_ROOT();
    append(path, "/core/demos/seq_io/example.fa");
    
    FaiIndex faiIndex;

    // Try to read the FAI index.
    bool readSuccess = (read(faiIndex, toCString(path)) == 0);
    if (!readSuccess)
        std::cerr << "Could not read the FAI index.  Not fatal, we can just build it.\n";

    // Try to build the FAI index (in memory) if reading was unsuccessful.  If
    // building into memory succeeded, we try to write it out.
    if (!readSuccess)
    {
        if (build(faiIndex, toCString(path)) != 0)
        {
            std::cerr << "FATAL: Could not build FAI index.\n";
            return 1;
        }

        if (write(faiIndex) != 0)
        {
            std::cerr << "FATAL: Could not write out FAI index after building.\n";
            return 1;
        }
    }

    // Now, read the first 1000 characters of chr1.
    unsigned idx = 0;
    if (!getIdByName(faiIndex, "chr", idx))
    {
        std::cerr << "FATAL: chr1 not found in FAI index.\n";
        return 1;
    }
    CharString seq;
    if (readRegion(seq, faiIndex, idx, 0, 100) != 0)
    {
        std::cerr << "FATAL: Problem reading FASTA file through FAI index.\n";
        return 1;
    }

    // Now print the first 100 characters we just read.
    std::cout << "chr:1-100 = " << seq << "\n";
    
    return 0;
}

The output is as follows:

chr:1-100 = CCTATCTAATAATATACCTTATACTGGACTAGTGCCAATATTAAAATGAAGTGGGCGTAGTGTGTAATTTGATTGGGTGGAGGTGTGGCTTTGGCGTGCT

Member Functions Detail

FaiIndex::FaiIndex();

Constructor.

Interface Functions Detail

int build(faiIndex, seqFileName[, faiFileName]);

Create a FaiIndex from FASTA file.

Parameters

faiIndex The FaiIndex to build into.
seqFileName Path to the FASTA file to build an index for. Type: char const *.
faiFileName Path to the FAI file to use as the index file. Type: char const *. Default: "${seqFileName}.fai".

Returns

int 0 on success, non-0 on errors.

void clear(faiIndex);

Reset a FaiIndex object to the state after default construction.

Parameters

faiIndex The FaiIndex to clear.

bool getIdByName(faiIndex, name, id);

Return id (numeric index in the file) of a sequence in a FAI file.

Parameters

faiIndex The FaiIndex to query.
name The name of the sequence to look the id up for. Type: SequenceConcept.
id The id of the sequence is written here.

Returns

bool true if a reference with the given name is known in the index.

__uint64 numSeqs(faiIndex);

Return the number of sequences known to a FaiIndex.

Parameters

faiIndex The FaiIndex to query.

Returns

__uint64 The number of sequences in the index.

int read(faiIndex, fastaFileName[, faiFileName]);

Read a FAI index from file.

Parameters

faiIndex The FaiIndex to read into.
fastaFileName Path to the FASTA file to read. Type: char const *.
faiFileName Path to the FAI file to read. Type: char const *. Defaults to "${fastaFileName}.fai".

Returns

int 0 on success, non-0 on errors.

int readRegion(str, faiIndex, refId, beginPos, endPos); int readRegion(str, faiIndex, region);

Read a region through an FaiIndex.

Parameters

str The String to read the sequence into.
faiIndex The FaiIndex to read from.
refId The id of the reference to read (Type: unsigned).
beginPos The begin position of the region to read (Type: unsigned).
endPos The end position of the region to read (Type: unsigned).
region The GenomicRegion to read.

Returns

int 0 on success, non-0 on errors.

int readSequence(str, faiIndex, refId);

Load a whole sequence from a FaiIndex.

Parameters

str The String to read into.
faiIndex The FaiIndex to read from.
refID The index of the sequence in the file.

Returns

int 0 on success, non-0 on errors.

__uint64 sequenceLength(faiIndex, refId);

Return length of the sequence with the given id in the FaiIndex.

Parameters

faiIndex The FaiIndex to query.
refId The id of the sequence to get the length of.

Returns

__uint64 The length of the sequence with index refId in faiIndex.

CharString sequenceName(faiIndex, refId);

Return the name of the sequence with th egiven id in the FaiIndex.

Parameters

faiIndex The FaiIndex to query.
refId The index of the sequence.

Returns

CharString The name of the sequence with the given id.

int write(faiIndex[, faiFileName]);

Write out an FaiIndex object.

Parameters

faiIndex The FaiIndex to write out.
faiFileName The name of the FAI file to write to. This parameter is optional only if the FAI index knows the FAI file name from a previous build call. By default, the FAI file name from the previous call to build is used. Type: char const *.

Returns

int 0 on success, 1 on errors.