SeqAn3 3.4.0-rc.3
The Modern C++ library for sequence analysis.
|
Stores the header information of SAM/BAM files. More...
#include <seqan3/io/sam_file/header.hpp>
Public Types | |
using | program_info_t = sam_file_program_info_t |
Stores information of the program/tool that was used to create the file. | |
Public Member Functions | |
ref_ids_type & | ref_ids () |
The range of reference ids. | |
Constructors, destructor and assignment | |
sam_file_header ()=default | |
Defaulted. | |
sam_file_header (sam_file_header const &)=default | |
Defaulted. | |
sam_file_header & | operator= (sam_file_header const &)=default |
Defaulted. | |
sam_file_header (sam_file_header &&)=default | |
Defaulted. | |
sam_file_header & | operator= (sam_file_header &&)=default |
Defaulted. | |
~sam_file_header ()=default | |
Defaulted. | |
sam_file_header (ref_ids_type ref_ids) | |
Construct from a range of reference ids. | |
Public Attributes | |
std::vector< std::string > | comments |
The list of comments. | |
std::string | format_version |
The file format version. Note: this is overwritten by our formats on output. | |
std::string | grouping |
The grouping of the file. SAM: [none, query, reference]. | |
std::vector< program_info_t > | program_infos |
The list of program information. | |
std::vector< std::pair< std::string, std::string > > | read_groups |
The Read Group Dictionary (used by the SAM/BAM format). | |
std::unordered_map< key_type, int32_t, key_hasher, detail::view_equality_fn > | ref_dict {} |
The mapping of reference id to position in the ref_ids() range and the ref_id_info range. | |
std::vector< std::tuple< int32_t, std::string > > | ref_id_info {} |
The reference information. (used by the SAM/BAM format) | |
std::string | sorting |
The sorting of the file. SAM: [unknown, unsorted, queryname, coordinate]. | |
std::string | subsorting |
The sub-sorting of the file. SAM: [unknown, unsorted, queryname, coordinate](:[A-Za-z0-9_-]+)+. | |
std::string | user_tags |
Additional user-defined tags. | |
Stores the header information of SAM/BAM files.
|
inline |
Construct from a range of reference ids.
[in] | ref_ids | The range over reference ids. |
|
inline |
The range of reference ids.
This member function gives you access to the range of reference ids.
When reading a file, there are three scenarios: 1) Reference id information is provided on construction. In this case, no copy is made but this function gives you a reference to the provided range. When reading the header or the records, their reference information will be checked against the given input. 2) No reference information is provided on construction but the @SQ tags are present in the header. In this case, the reference id information is extracted from the header and this member function provides access to them. When reading the records, their reference id information will be checked against the header information. 3) No reference information is provided on construction an no @SQ tags are present in the header. In this case, the reference information is parsed from the records field::ref_id and stored in the header. This member function then provides access to the unique list of reference ids encountered in the records.
std::vector<std::pair<std::string, std::string> > seqan3::sam_file_header< ref_ids_type >::read_groups |
The Read Group Dictionary (used by the SAM/BAM format).
The read group dictionary stores the group id and additional information of each read group in the file. The record may store a RG tag information referencing one of the stored id's. The id information is required if the header is provided.
The additional information (2nd tuple entry) for the SAM format must follow the following formatting rules: The information is given in a tab separated TAG:VALUE format, where TAG must be one of [AH, AN, AS, m5, SP, UR]. The following information and rules apply for each tag (taken from the SAM specs):
std::vector<std::tuple<int32_t, std::string> > seqan3::sam_file_header< ref_ids_type >::ref_id_info {} |
The reference information. (used by the SAM/BAM format)
The reference information store the length (@LN tag) and additional information of each reference sequence in the file. The record must then store only the index of the reference. The name and length information are required if the header is provided and each reference sequence that is referred to in any of the records must be present in the dictionary, otherwise a seqan3::format_error will be thrown upon reading or writing a file.
The additional information (2nd tuple entry) must model the following formatting rules: The information is given in a tab separated TAG:VALUE format, where TAG must be one of [AH, AN, AS, m5, SP, UR]. The following information and rules apply for each tag (taken from the SAM specs):