The seqan3::sam_file_input class comes with four constructors: One for construction from a file name, one for construction from an existing stream and a known format and both of the former with or without additional reference information. Constructing from a file name automatically picks the format based on the extension of the file name. Constructing from a stream can be used if you have a non-file stream, like std::cin or std::istringstream. It also comes in handy, if you cannot use file-extension based detection, but know that your input file has a certain format.
Passing reference information, e.g.
ref_ids: The name of the references, e.g. "chr1", "chr2", ...
ref_sequences: The reference sequence information in the same order as the ref_ids.
Note that this is not the same as writing sam_file_input<> (with angle brackets). In the latter case they are explicitly set to their default values, in the former case automatic deduction happens which chooses different parameters depending on the constructor arguments. For opening from file, sam_file_input<> would have also worked, but for opening from stream it would not have.
You can define your own traits type to further customise the types used by and returned by this class, see seqan3::sam_file_input_default_traits for more details. As mentioned above, specifying at least one template parameter yourself means that you loose automatic deduction. The following is equivalent to the automatic type deduction example with a stream from above:
In the above example, rec has the type seqan3::sam_file_input::record_type which is a specialisation of seqan3::record and behaves like a std::tuple (that's why we can access it via get). Instead of using the seqan3::field based interface on the record, you could also use std::get<0> or even std::get<dna4_vector> to retrieve the sequence, but it is not recommended, because it is more error-prone.
Note
It is important to write auto & and not just auto, otherwise you will copy the record on every iteration. Since the buffer gets "refilled" on every iteration, you can also move the data out of the record if you want to store it somewhere without copying:
If you want to skip specific fields from the record you can pass a non-empty fields trait object to the seqan3::sam_file_input constructor to select the fields that should be read from the input. For example, you may only be interested in the mapping flag and mapping quality of your SAM data to get some statistics. The following snippets demonstrate the usage of such a fields trait object.
When reading a file, all fields not present in the file (but requested implicitly or via the selected_field_ids parameter) are ignored and the respective value in the record stays empty.
Reading record-wise (decomposed records)
Instead of using get on the record, you can also use structured bindings to decompose the record into its elements. Considering the example of reading only the flag and mapping quality like before you can also write:
The conversion from a CIGAR string to an alignment can be done with the function seqan3::alignment_from_cigar. You need to pass the reference sequence with the position the read was aligned to and the read sequence. All of it is already in the record when reading a SAM file:
Since SeqAn files are ranges, you can also create views over files. A useful example is to filter the records based on certain criteria, e.g. minimum length of the sequence field: