Class Index
Indices are data structures which contain preprocessing data of a fixed text (or set of texts). In combination with a Finder or an VSTreeIterator it allows fast dictionary look-up and advanced computations.

All Subcl's FMIndex, IndexDfi, IndexEsa, IndexQGram, IndexSa, IndexWotd, OpenAddressingQGramIndex, PizzaChiliIndex
Defined in <seqan/index.h>
Signature class Index<TText[, TSpec]>;

Template Parameters

TSpec The index type.
TText The text type. Types: String, StringSet

Interface Function Overview

Interface Metafunction Overview

Detailed Description

An index contains various arrays or objects, also called fibres (see Fibre). These fibres are created on demand depending on the requirements of an algorithm. To force the fibre creation you can use the function indexCreate. The list of fibres is available in Fibre

Examples

The following code shows how to search for exact matches between the reference "tobeornottobe" and the pattern "to" with the means of a Finder.

#include <seqan/index.h>

using namespace seqan;

int main()
{
    typedef Index<CharString> TIndex;

    TIndex index("tobeornottobe");
    Finder<TIndex> finder(index);

    // The Finder object has a pointer to the first, current and last hit
    // Each consecutive call sets the current pointer to the appropriate hit
    while (find(finder, "to"))
        std::cout << "Hit at position: " << position(finder) << std::endl;

    return 0;
}

The result is as follows

Hit at position: 9
Hit at position: 0

This code shows how an index can be used with iterators to achieve a pre-order tree like traversal in DFS of the text "tobeornottobe". In order to do so a Top-Down History iterator is used.

#include <seqan/index.h>

using namespace seqan;

int main()
{
    typedef Index<CharString> TIndex;

    TIndex index("tobeornottobe");
    Iterator<TIndex, TopDown<ParentLinks<> > >::Type it(index);

    do
    {
        // Print the letters from the root to the current node
        std::cout << representative(it) << std::endl;

        if (!goDown(it) && !goRight(it))
            while (goUp(it) && !goRight(it))
                ;
    }
    while (!isRoot(it));

    return 0;
}

The result is as follows

be
beornottobe
e
eornottobe
nottobe
o
obe
obeornottobe
ornottobe
ottobe
rnottobe
t
tobe
tobeornottobe
ttobe

Note that you can also use specialized iterators such as:

Iterator<TIndex, TopDown<ParentLinks<PreOrder> > >::Type

or

Iterator<TIndex, TopDown<ParentLinks<PostOrder> > >::Type

You can achieve a post-order traversal like this:

    Iterator<TIndex, TopDown<ParentLinks<Postorder> > >::Type it(index);

    goBegin(it);
    while (!atEnd(it))
    {
        std::cout << representative(it) << std::endl;
        ++it;
    }

The result is as follows

beornottobe
be
eornottobe
e
nottobe
obeornottobe
obe
ornottobe
ottobe
o
rnottobe
tobeornottobe
tobe
ttobe
t

Interface Functions Detail

void clear(index);

Resets all fibres of an index.

Parameters

index The index to be cleared.

Data Races

Thread safety unknown!

TSize countSequences(index);

Return the number of sequences in an index' underlying text.

Parameters

index The index to return the number of sequences of.

Returns

TSize The number of sequences in the index' underlying text with TSize being the result Size.

Examples

The following example shows how to count characters of an index, determine the number of sequences involved and how to search for a pattern.

#include <seqan/index.h>

using namespace seqan;

int main()
{
    typedef StringSet<String<char> >    TText;
    typedef Index<TText>                TIndex;

    TText text;
    appendValue(text, "MISSISSIPPI");
    appendValue(text, "MYMISSISAHAPPY");

    TIndex index(text);
    Finder<TIndex> finder(index);

    std::cout << "The text has " << length(index) << " characters and consists of " << countSequences(index) <<
    " sequences." << std::endl;

    // The Finder object has a pointer to the first, current and last hit
    // Each consecutive call sets the current pointer to the appropriate hit
    while (find(finder, "MISS"))
        std::cout << "Hit at position: " << position(finder) << std::endl;

    return 0;
}

The output is as follows:

The text has 25 characters and consists of 2 sequences.
Hit at position: < 1 , 2 >
Hit at position: < 0 , 0 >

Data Races

Thread safety unknown!

TFibre getFibre(index, fibreTag);

Returns a specific fibre of a container.

Parameters

index The container holding the fibre.
fibreTag A tag that identifies the Fibre. Types: Index Esa Fibres, FM Index Fibres, Index Wotd Fibres, and Index QGram Fibres.

Returns

TFibre A reference to the Fibre object.

Examples

The following example shows how to search for a pattern in a string.

#include <seqan/index.h>

using namespace seqan;

int main()
{
    typedef StringSet<String<char> >    TText;
    typedef Index<TText>                TIndex;
    typedef SAValue<TIndex>::Type       TSAValue;

    TText text;
    appendValue(text, "MISSISSIPPI");
    appendValue(text, "MYMISSISAHAPPY");

    TIndex index(text);
    Iterator<TIndex, TopDown<> >::Type it(index);

    goDown(it, "SSI");

    std::cout << "SSI occurs in " << getFrequency(it) << " sequences." << std::endl;

    // 1 alternative to print the location of hits
    String<TSAValue> saPositions = getOccurrences(it);
    for (unsigned i = 0; i < length(saPositions); ++i)
        std::cout << "Hit in sequence " << saPositions[i].i1 << " at position " << saPositions[i].i2 << std::endl;

    std::cout << "----------------------------" << std::endl;

    // 2 alternative to print the location of hits
    Pair<unsigned> hitInterval = range(it);
    for (; hitInterval.i1 < hitInterval.i2; ++hitInterval.i1)
        std::cout << "Hit in sequence " << getFibre(index, FibreSA())[hitInterval.i1].i1 <<
        " at position " << getFibre(index, FibreSA())[hitInterval.i1].i2 << std::endl;

    return 0;
}

The output is as follows:

SSI occurs in 2 sequences.
Hit in sequence 0 at position 5
Hit in sequence 1 at position 4
Hit in sequence 0 at position 2
----------------------------
Hit in sequence 0 at position 5
Hit in sequence 1 at position 4
Hit in sequence 0 at position 2

Data Races

Thread safety unknown!

See Also

bool indexCreate(index, fibreTag[, algoTag]);

Creates a specific Fibre.

Parameters

fibreTag A tag that identifies the Fibre
algoTag A tag that identifies the algorithm which is used to create the fibre. Default: The result of DefaultIndexCreator.
index The Index object holding the fibre.

Returns

bool true on a success and false otherwise

indexCreate calls the fibre corresponding createXXX(..) function (e.g. createSuffixArray).

Data Races

Thread safety unknown!

TFibre indexRawText(position, index);

Shortcut for $getFibre(.., FibreRawText).

Parameters

position A position in the array on which the value should be accessed.
index The Index object.

Returns

TFibre A reference or proxy to the value.

Data Races

Thread safety unknown!

bool indexRequire(index, fibreTag);

On-demand creation of a specific Fibre.

Parameters

index The Index object holding the fibre.
fibreTag A tag that identifies the Fibre

Returns

bool true on a successful creation.

If the fibre already exists (indexSupplied is true) then indexRequire does nothing. If the fibre doesn't exist then indexCreate is called to create it.

Examples

The following code shows how the BWT of an text can be computed.

#include <seqan/index.h>

using namespace seqan;

int main()
{
    Index<String<char> > index("MISSISSIPPI");

    // Because indices are build on demand we force the index creation here.
    indexRequire(index, FibreSA());

    std::cout << "BWT\tSuffices" << std::endl;

    for (unsigned i = 0; i < length(indexSA(index)); ++i)
    {
        unsigned textPos = (saAt(i, index) == 0) ? length(index) - 1 : saAt(i, index) - 1;
        std::cout << textAt(textPos, index) << "\t" << suffix(indexText(index), textPos) << std::endl;
    }
    return 0;
}

The output is as follows:

BWT	Suffices
P	PI
S	SIPPI
S	SISSIPPI
M	MISSISSIPPI
I	I
P	PPI
I	IPPI
S	SSIPPI
S	SSISSIPPI
I	ISSIPPI
I	ISSISSIPPI

Data Races

Thread safety unknown!

bool indexSupplied(index, fibreTag);

Returns whether a specific Fibre is present.

Parameters

index The Index object holding the fibre.
fibreTag A tag that identifies the Fibre Index Fibres.

Returns

bool true, iff the fibre is present.

Data Races

Thread safety unknown!

TFibre indexText(index);

Note.

The result of this function when used on an Index<TText, FMIndex<TOccSpec, CompressText> > is not defined.

Shortcut for getFibre(.., EsaText).

Parameters

index The Index object holding the fibre.

Returns

TFibre A reference to the text of the index.

Examples

The following code shows how the BWT of a text can be computed.

#include <seqan/index.h>

using namespace seqan;

int main()
{
    Index<String<char> > index("MISSISSIPPI");

    // Because indices are build on demand we force the index creation here.
    indexRequire(index, FibreSA());

    std::cout << "BWT\tSuffices" << std::endl;

    for (unsigned i = 0; i < length(indexSA(index)); ++i)
    {
        unsigned textPos = (saAt(i, index) == 0) ? length(index) - 1 : saAt(i, index) - 1;
        std::cout << textAt(textPos, index) << "\t" << suffix(indexText(index), textPos) << std::endl;
    }
    return 0;
}

The output is as follows:

BWT	Suffices
P	PI
S	SIPPI
S	SISSIPPI
M	MISSISSIPPI
I	I
P	PPI
I	IPPI
S	SSIPPI
S	SSISSIPPI
I	ISSIPPI
I	ISSISSIPPI

Data Races

Thread safety unknown!

TSize length(index);

Returns the number of characters in the underlying text of the index.

Parameters

index An index of a text.

Returns

TSize Returns the number of characters in the raw underlying text of the index with TSize being the result of the Size meta-function of Index.

Examples

The following example shows how to count characters of an index, determine the number of sequences involved and how to search for a pattern.

#include <seqan/index.h>

using namespace seqan;

int main()
{
    typedef StringSet<String<char> >    TText;
    typedef Index<TText>                TIndex;

    TText text;
    appendValue(text, "MISSISSIPPI");
    appendValue(text, "MYMISSISAHAPPY");

    TIndex index(text);
    Finder<TIndex> finder(index);

    std::cout << "The text has " << length(index) << " characters and consists of " << countSequences(index) <<
    " sequences." << std::endl;

    // The Finder object has a pointer to the first, current and last hit
    // Each consecutive call sets the current pointer to the appropriate hit
    while (find(finder, "MISS"))
        std::cout << "Hit at position: " << position(finder) << std::endl;

    return 0;
}

The output is as follows:

The text has 25 characters and consists of 2 sequences.
Hit at position: < 1 , 2 >
Hit at position: < 0 , 0 >

Data Races

Thread safety unknown!

bool open(index, fileName[, mode]);

This functions opens an index from disk.

Parameters

index The index to be opened.
fileName C-style character string containing the file name.
mode The combination of flags defining how the file should be opened.To open a file read-only, write-only or to read and write use OPEN_RDONLY, OPEN_WRONLY, or OPEN_RDWR.To create or overwrite a file add OPEN_CREATE.To append a file if existing add OPEN_APPEND.To circumvent problems, files are always opened in binary mode. Default: OPEN_RDWR | OPEN_CREATE | OPEN_APPEND.

Returns

bool true on success and false otherwise.

Examples

The following code shows how the function open is used with indices.

#include <seqan/index.h>

using namespace seqan;

int main()
{
    typedef StringSet<String<char> >    TText;
    typedef Index<TText>                TIndex;

    TText text;
    appendValue(text, "MISSISSIPPI");
    appendValue(text, "MYMISSISAHAPPY");

    TIndex saveIndex(text);

    // Because indices are build on demand we fore the index creation here.
    indexCreate(saveIndex, FibreSA());

    const char * tempFileName = SEQAN_TEMP_FILENAME();
    std::cout << save(saveIndex, tempFileName) << std::endl;

    // In a different program
    TIndex openIndex;
    std::cout << open(openIndex, tempFileName) << std::endl;

    return 0;
}

The output is as follows:

1
1

Data Races

Thread safety unknown!

TValue rawtextAt(position, index);

Note.

The result of this function when used on an Index<TText, FMIndex<TOccSpec, Compress> > is not defined.

Shortcut for value(indexRawText(..), ..).

Parameters

index The Index object. Types: Index
position A position in the array on which the value should be accessed.

Returns

TValue A reference or proxy to the value. The type is the result if the meta-function Reference.

Data Races

Thread safety unknown!

bool save(index, fileName[, mode]);

This functions saves an index to disk.

Parameters

index The index to be opened.
fileName C-style character string containing the file name.
mode The combination of flags defining how the file should be opened.To open a file read-only, write-only or to read and write use OPEN_RDONLY, OPEN_WRONLY, or OPEN_RDWR.To create or overwrite a file add OPEN_CREATE.To append a file if existing add OPEN_APPEND.To circumvent problems, files are always opened in binary mode. Default: OPEN_RDWR | OPEN_CREATE | OPEN_APPEND.

Returns

bool true on success and false otherwise.

Examples

The following code shows how the function save is used with indices.

#include <seqan/index.h>

using namespace seqan;

int main()
{
    typedef StringSet<String<char> >    TText;
    typedef Index<TText>                TIndex;

    TText text;
    appendValue(text, "MISSISSIPPI");
    appendValue(text, "MYMISSISAHAPPY");

    TIndex saveIndex(text);

    // Because indices are build on demand we fore the index creation here.
    indexCreate(saveIndex, FibreSA());

    const char * tempFileName = SEQAN_TEMP_FILENAME();
    std::cout << save(saveIndex, tempFileName) << std::endl;

    // In a different program
    TIndex openIndex;
    std::cout << open(openIndex, tempFileName) << std::endl;

    return 0;
}

The output is as follows:

1
1

Data Races

Thread safety unknown!

Interface Metafunctions Detail

DefaultIndexCreator<TIndex, TFibre>::Type;

Note.

Advanced functionality, not commonly used.

Default algorithm to create a demanded and not yet existing Fibre.

Template Parameters

TIndex An Index Type.
TFibre A tag specifying the fibre (e.g. EsaSA).

Returns

Type A tag specifying the default algorithm to create the fibre with.

DefaultIndexSpec<TText>::Type;

Default Index specialization type.

Template Parameters

TText The given text type.

Returns

TReturn Currently the return type is IndexEsa.

Examples

The following will define TIndex to be of type IndexEsa, because the default index type for String or StringSet is IndexEsa.

typedef DefaultIndexSpec<String<Dna> >::Type TIndex;

DefaultIndexStringSpec<TIndex>::Type;

Default String specialization type of the Fibre of an Index.

Template Parameters

TIndex An Index Type.

Returns

TReturn If the underlying text is a String or a set of Strings (see StringSet) the String's spec. type is returned.

Remarks

Most of the Index fibres are strings. The String specialization type is chosen by this meta-function.

GetVSTreeIteratorTraits<TIterator>::Type

Default behaviour of goNext when no second parameter is given.

Template Parameters

TIterator A VSTreeIterator.

Returns

TReturn Postorder by default and Preorder if TIterator is VSTree<TopDown<ParentLinks<> > > or VSTree<TopDown<ParentLinks<Preorder> > >.

Size<TIndex>::Type;

Returns the size type of an Index.

Template Parameters

TIndex The Index specialization.

Returns

Type The resulting size type of the index.