SeqAn3
The Modern C++ library for sequence analysis.
Parsing command line arguments with SeqAn3

Learning Objective:
You will learn how to use the seqan3::argument_parser class to parse command line arguments. This tutorial is a walkthrough with links to the API documentation and is also meant as a source for copy-and-paste code.

DifficultyEasy
Duration30-60 min
Prerequisite tutorialsQuick Setup (using CMake)
Recommended readingPOSIX conventions



Introduction

An easy and very flexible interface to a program is through the command line. This tutorial explains how to parse the command line using the SeqAn3 library’s seqan3::argument_parser class.

This class will give you the following functionality:

  • Robust parsing of command line arguments.
  • Simple validation of arguments (e.g. within a range or one of a list of allowed values).
  • Automatically generated and nicely formatted help screens when your program is called with --help. You can also export this help to HTML and man pages.
  • In the future, you are also able to automatically generate nodes for work flow engines such as KNIME or Galaxy.

Command line argument terminology

Before we start, let's agree on some terminology. Consider the following command line call

mycomputer$ ./program1 -f -i 4 --long-id 6 file1.txt

The binary program1 is called with several command line arguments. We call every single input an argument but differentiate their purpose into options, positional_options, flags or simply a value corresponding to one of the former. In our example above, -f is a flag which is never followed by a value, -i is an option with a short identifier (id) followed by its value 4, --long-id is also an option with a long identifier followed by its value 6, and file1.txt is a positional_option, because it is an option identified by its position instead of an identifier.

Name Purpose Example
option identify an argument by name (id-value pair) -i 5 or --long-id 5
flag boolean on/off flag (id) -f
positional option identify an argument by position (value) file1.txt

Have a look at the POSIX conventions for command line arguments if you want a detailed description on the requirements for the above. (Note: in the linked article the following holds: value="argument", option="option", flag = "option that does not require arguments", positional option ="non-option").

A continuous example

We will get to know the wide functionality of the argument parser by writing a little application and extending it step by step. Let's say we have a tab separated file data.tsv with information on the Game of Thrones Seasons (by Wikipedia):

Season Month Day Year Avg. U.S. viewers (millions)
1 April 17 2011 2.52
2 April 1 2012 3.80
3 March 31 2013 4.97
4 April 6 2014 6.84
5 April 12 2015 6.88
6 April 24 2016 7.69
7 July 16 2017 10.26

We want to build an application that is able to read the file with or without a header line, select certain seasons and compute the average or median from the "Avg. U.S. viewers (millions)" of the selected seasons.

The SeqAn3 argument parser class

Before we add any of the options, flags, and positional options, we will take a look at the seqan3::argument_parser class itself. It is constructed by giving a program's name and passing the parameters argc and argv from main. Note that no command line arguments have been parsed so far, but we can now add more information to the parser. After adding all desired information, the parsing is triggered by calling the seqan3::argument_parser::parse member function. Since the function throws in case any errors occur, we need to wrap it into a try-catch block. Here is a first working example:

#include <seqan3/argument_parser/all.hpp> // includes all necessary headers
#include <seqan3/core/debug_stream.hpp> // our custom output stream
using namespace seqan3;
int main(int argc, char ** argv)
{
argument_parser myparser{"Game-of-Parsing", argc, argv}; // initialise myparser
// ... add information, options, flags and positional options
try
{
myparser.parse(); // trigger command line parsing
}
catch (parser_invalid_argument const & ext) // catch user errors
{
debug_stream << "[Winter has come] " << ext.what() << "\n"; // customise your error message
return -1;
}
}

There are two types of exceptions: The seqan3::parser_design_error which indicates that the parser setup was wrong (directed to the developer of the program, not the user!) and the seqan3::parser_invalid_argument, which detects corrupted user input. Additionally, there are special user requests that are handled by the argument parser by exiting the program via std::exit, e.g. calling --help that prints a help page screen.

Design restrictions (seqan3::parser_design_error)

The argument parser checks the following restrictions and throws a seqan3::parser_design_error if they are not satisfied:

  • Long identifiers: must be unique, more than one character long, may only contain alphanumeric characters, as well as _, -, or @, but never start with -.
  • Short identifiers: must be unique and consist of only a single letter that is alphanumeric characters, _ or @.
  • either the short or long id may be empty but not both at the same time.
  • Only the last positional option may be a list (see lists).
  • The flag identifiers -h, --help, --advanced-help, --advanced-help, --export-help, --version, --copyright are predefined and cannot be specified manually or used otherwise.
  • The seqan3::argument_parser::parse function may only be called once (per parser).

Input restrictions (seqan3::parser_invalid_argument)

When calling the seqan3::argument_parser::parse function, the following potential user errors are caught (and handled by throwing a corresponding exception):

seqan3::unknown_option The option/flag identifier is not known to the parser.
seqan3::too_many_arguments More command line arguments than expected are given.
seqan3::too_few_arguments Less command line arguments than expected are given.
seqan3::required_option_missing A required option is not given (see Required options)
seqan3::type_conversion_failed The given value cannot be cast to the expected type
seqan3::validation_failed (Positional-)Option validation failed (see Validators)

Special Requests (std::exit)

We denote "special requests" to command line input that does not aim to execute your program but rather display information about your program. Because on those request we never expect that the program is intended to run, we exit the program at the end of the seqan3::argument_parser::parse call via std::exit.

Currently we support the following special requests:

-h/--help Prints the help page to the command line (std::cout)
-hh/--advanced-help Prints the advanced help page to the command line (std::cout)
--export-help Exports the help page in a different format (std::cout)
--version Prints the version information to the command line (std::cout)
--copyright Prints the copyright information to the command line (std::cout)

Assignment 1

Copy the minimal working example into a cpp file in your working directory and compile it. Play around with the binary, e.g. requesting special behaviour like printing the help page.

Meta data

Of course there is not much information to display yet, since we did not provide any. Let's improve this by modifying the seqan3::argument_parser::info member of our parser. The seqan3::argument_parser::info member is a struct of type seqan3::argument_parser_meta_data and contains the following members that can be customised:

Assignment 2

  1. Extend the minimal example from assignment 1 by a function void initialize_argument_parser(seqan3::argument_parser & parser).
  2. Within this function, customise the parser with the following information:
    • Set the author to your favourite Game of Thrones character (Don't have one? Really? Take "Cersei").
    • Set the short description to "Aggregate average US. Game of Thrones viewers by season.".
    • Set the version to 1.0.0 .
    • Set some more, if you want to. Hint: Check out the API documentation for seqan3::argument_parser_meta_data and seqan3::argument_parser::info.
  3. Try calling --help again and see the results.

Solution

#include <seqan3/argument_parser/all.hpp> // includes all necessary headers
#include <seqan3/core/debug_stream.hpp> // our custom output stream
using namespace seqan3;
void initialize_argument_parser(argument_parser & parser)
{
parser.info.author = "Cersei";
parser.info.short_description = "Aggregate average Game of Thrones viewers by season.";
parser.info.version = "1.0.0";
}
int main(int argc, char ** argv)
{
argument_parser myparser{"Game-of-Parsing", argc, argv};
// code from assignment 1
}

Adding options, flags and positional_options

Now that we're done with the meta information, we will learn how to add the actual functionality of options, flags and positional options. For each of these three there is a respective member function:

Each of the functions above take a variable by reference as the first parameter, which will directly store the corresponding parsed value from the command line. This has two advantages compared to other command line parsers: (1) There is no need for a getter function after parsing and (2) the type is automatically deduced (e.g. with boost::program_options you would need to access parser["file_path"].as<std::filesystem::path>() afterwards).

The seqan3::argument_parser::add_flag only allows boolean variables while seqan3::argument_parser::add_option and seqan3::argument_parser::add_positional_option allow any type that is convertible from a std::string via std::from_chars or a container of the former (see List options). Besides accepting generic types, the parser will automatically check if the given command line argument can be converted into the desired type and otherwise throw a seqan3::type_conversion_failed exception.

So how does this look like? The following code snippet adds a positional option to parser.

size_t variable{};
parser.add_positional_option(variable, "This is a description.");

Additionally to the variable that will store the value, you need to pass a description. This description will help users of your application to understand how the option is affecting your program.

Note
As the name suggest, positional options are identified by their position. In SeqAn3, the first add_positional_option() will be linked to the first command line argument that is neither an option-value pair nor a flag. So the order of initialising your parser determines the order of assigning command line arguments to the respective variables. We personally recommend to always use regular options (id-value pairs) because they are more expressive and it is easier to spot errors.

You can add an option like this:

size_t variable{};
parser.add_option(variable, 'n', "my-number", "This is a description.");

Additionally to the variable that will store the value and the description, you need to specify a short and long identifier. The example above will recognize an option -n or --my-number given on the command line and expect it to be followed by a value separated only by = or space or by nothing at all.

Finally, you can add a flag with the following call:

bool variable{};
parser.add_flag(variable, 'f', "my_flag", "This is a description.");

Note that you can omit either the short identifier by passing '\0' or the long identifier by passing "" but you can never omit both at the same time.

Default values

With the current design, every option/flag/positional automatically has a default value which simply is the value with which you initialise the corresponding variable that is passed as the first parameter. Yes it is that easy, just make sure to always initialise your variables properly.

Assignment 3

Getting back to our example application, let's extend our code from Assignment 2 by the following:

As a best practice recommendation for handling multiple options/flags/positionals, you should store the variables in a struct and pass this struct to your parser initialisation function. You can use the following code that introduces a cmd_arguments struct storing all relevant command line arguments. Furthermore, it provides you with a small function run_program() that reads in the data file and aggregates the data by the given information. You don't need to look at the code of run_program(), it is only so that we have a working program.

Copy and paste this code into the beginning of your application:

#include <fstream>
#include <numeric>
#include <range/v3/view/split.hpp>
#include <seqan3/argument_parser/all.hpp> // includes all necessary headers
#include <seqan3/core/debug_stream.hpp> // our custom output stream
#include <seqan3/std/charconv> // includes std::from_chars
#include <seqan3/std/filesystem> // use std::filesystem::path
using namespace seqan3;
// This is the program!
// Take a look at it if you are interested in an example of parsing a data file.
// -----------------------------------------------------------------------------
template <typename number_type, typename range_type>
number_type to_number(range_type && range)
{
number_type num;
auto res = std::from_chars(&str[0], &str[0] + str.size(), num);
if (res.ec != std::errc{})
{
debug_stream << "Could not cast '" << range << "' to a valid number\n";
throw std::invalid_argument{"CAST ERROR"};
}
return num;
}
void run_program(std::filesystem::path & path, uint32_t yr, std::string & aggr_by, bool hd_is_set)
{
std::ifstream file{path.string()};
if (file.is_open())
{
if (hd_is_set)
std::getline(file, line); // ignore first line
while (std::getline(file, line))
{
auto splitted_line = line | std::views::split('\t');
auto it = std::next(splitted_line.begin(), 3); // move to 4th column
if (to_number<uint32_t>(*it) >= yr)
v.push_back(to_number<double>(*std::next(it)));
}
if (aggr_by == "median")
debug_stream << ([&v] () { std::sort(v.begin(), v.end()); return v[v.size()/2]; })() << std::endl;
else if (aggr_by == "mean")
debug_stream << ([&v] () { double sum{}; for (auto i : v) sum += i; return sum / v.size(); })() << std::endl;
else
debug_stream << "I do not know the aggregation method " << aggr_by << std::endl;
}
else
{
debug_stream << "Error: Cannot open file for reading.\n";
}
}
// -----------------------------------------------------------------------------
struct cmd_arguments
{
std::filesystem::path file_path{};
uint32_t year{};
std::string aggregate_by{"mean"};
bool header_is_set{};
};

Your task is now to extend the initialisation function by the following:

  1. Extend your initialise_argument_parser function by a parameter that takes a cmd_arguments object and adapt the function call in your main function to pass on args;
  2. Set the default value of aggregate_by to "mean".

You can now use the variables from args to add the following inside of the initialise_argument_parser function:

  1. Add a positional option to the parser that sets the variable file_path so our program knows the location of the data file to read in.
  2. Add an option -y/--year that sets the variable year, which will enable our program to filter the data by only including a season if it got released after the value year.
  3. Add an option -a/--aggregate-by that sets the variable aggregate_by, which will enable our program choose between aggregating by mean or median.
  4. Add a flag -H/--header-is-set that sets the variable header_is_set, which lets the program know whether it should ignore the first line in the file.

Take a look at the help page again after you've done all of the above. You will notice that your options have been automatically included. Copy and paste the example data file from the introduction and check if your options are set correctly by trying the following few calls:

./game_of_parsing -H -y 2014 data.tsv
7.9175
./game_of_parsing -H -y 2010 --aggregate-by median data.tsv
6.84

Solution

void initialize_argument_parser(argument_parser & parser, cmd_arguments & args)
{
parser.info.author = "Cercei";
parser.info.short_description = "Aggregate average Game of Thrones viewers by season.";
parser.info.version = "1.0.0";
parser.add_positional_option(args.file_path, "Please provide a tab separated data file.");
parser.add_option(args.year, 'y', "year", "Only data entries that are newer than `year` are considered.");
parser.add_option(args.aggregate_by, 'a', "aggregate-by", "Choose your method of aggregation: mean or median.");
parser.add_flag(args.header_is_set, 'H', "header-is-set", "Let us know whether your data file contains a "
"header to ensure correct parsing.");
}

In case you are stuck, the complete program now looks like this:

#include <fstream>
#include <numeric>
#include <range/v3/view/split.hpp>
#include <seqan3/argument_parser/all.hpp> // includes all necessary headers
#include <seqan3/core/debug_stream.hpp> // our custom output stream
#include <seqan3/std/charconv> // includes std::from_chars
#include <seqan3/std/filesystem> // use std::filesystem::path
using namespace seqan3;
// This is the program!
// Take a look at it if you are interested in an example of parsing a data file.
// -----------------------------------------------------------------------------
template <typename number_type, typename range_type>
number_type to_number(range_type && range)
{
number_type num;
auto res = std::from_chars(&str[0], &str[0] + str.size(), num);
if (res.ec != std::errc{})
{
debug_stream << "Could not cast '" << range << "' to a valid number\n";
throw std::invalid_argument{"CAST ERROR"};
}
return num;
}
void run_program(std::filesystem::path & path, uint32_t yr, std::string & aggr_by, bool hd_is_set)
{
std::ifstream file{path.string()};
if (file.is_open())
{
if (hd_is_set)
std::getline(file, line); // ignore first line
while (std::getline(file, line))
{
auto splitted_line = line | std::views::split('\t');
auto it = std::next(splitted_line.begin(), 3); // move to 4th column
if (to_number<uint32_t>(*it) >= yr)
v.push_back(to_number<double>(*std::next(it)));
}
if (aggr_by == "median")
debug_stream << ([&v] () { std::sort(v.begin(), v.end()); return v[v.size()/2]; })() << std::endl;
else if (aggr_by == "mean")
debug_stream << ([&v] () { double sum{}; for (auto i : v) sum += i; return sum / v.size(); })() << std::endl;
else
debug_stream << "I do not know the aggregation method " << aggr_by << std::endl;
}
else
{
debug_stream << "Error: Cannot open file for reading.\n";
}
}
// -----------------------------------------------------------------------------
struct cmd_arguments
{
std::filesystem::path file_path{};
uint32_t year{};
std::string aggregate_by{"mean"};
bool header_is_set{};
};
void initialize_argument_parser(argument_parser & parser, cmd_arguments & args)
{
parser.info.author = "Cercei";
parser.info.short_description = "Aggregate average Game of Thrones viewers by season.";
parser.info.version = "1.0.0";
parser.add_positional_option(args.file_path, "Please provide a tab separated data file.");
parser.add_option(args.year, 'y', "year", "Only data entries that are newer than `year` are considered.");
parser.add_option(args.aggregate_by, 'a', "aggregate-by", "Choose your method of aggregation: mean or median.");
parser.add_flag(args.header_is_set, 'H', "header-is-set", "Let us know whether your data file contains a "
"header to ensure correct parsing.");
}
int main(int argc, char ** argv)
{
argument_parser myparser{"Game-of-Parsing", argc, argv}; // initialise myparser
cmd_arguments args{};
initialize_argument_parser(myparser, args);
try
{
myparser.parse(); // trigger command line parsing
}
catch (parser_invalid_argument const & ext) // catch user errors
{
debug_stream << "[Winter has come] " << ext.what() << "\n"; // customise your error message
return -1;
}
// parsing was successful !
// we can start running our program
run_program(args.file_path, args.year, args.aggregate_by, args.header_is_set);
return 0;
}

List options

In some use cases you may want to allow the user to specify an option multiple times and store the values in a list. With the seqan3::argument_parser this behaviour can be achieved simply by choosing your input variable to be of a container type (e.g. std::vector). The parser registers the container type through seqan3::container and will adapt the parsing of command line arguments accordingly.

Example:

std::vector<std::string> list_variable{};
parser.add_option(list_variable, 'n', "names", "Give me some names.");

Adding this option to a parser will allow you to call the program like this:

./some_program -n Jon -n Arya -n Ned

The vector list_variable will then contain all three names ["Jon", "Arya", "Ned"].

List positional options?

An arbitrary positional option cannot be a list because of the ambiguity of which value belongs to which positional option. We do allow the very last option to be a list for convenience though. Note that if you try to add a positional list option which is not the last positional option, a seqan3::parser_design_error will be thrown.

Example:

std::string variable{};
std::vector<std::string> list_variable{};
parser.add_positional_option(variable, "Give me a single variable.");
parser.add_positional_option(list_variable, "Give me one or more variables!.");

Adding these positional options to a parser will allow you to call the program like this:

./some_program Stark Jon Arya Ned

The first variable will be filled with the value Stark while the vector list_variable will then contain the three names ["Jon", "Arya", "Ned"].

Assignment 4

We extend the solution from assignment 3:

  1. Remove the option -y/--year, since we want to keep it simple and only aggregate by season now.
  2. Add a variable seasons of type std::vector<uint8_t> to the struct cmd_arguments.
  3. Add a list option -s/--season that will fill the variable seasons which lets the user specify which seasons to aggregate instead of the year.
  4. [BONUS] If you have some spare time, try to adjust the program code to aggregate by season. Hint: Use std::find.
    Otherwise just replace the while loop with the following:
    while (std::getline(file, line))
    {
    auto splitted_line = line | std::views::split('\t');
    auto it = splitted_line.begin(); // move to 1rst column
    if (std::find(sn.begin(), sn.end(), to_number<uint8_t>(*it)) != sn.end())
    v.push_back(to_number<double>(*std::next(it, 4)));
    }

Take a look at the help page again after you've done all of the above. You will notice that your option -s/--season even tells you that it is of type List of unsigned 8 bit integer's. Check if your options are set correctly by trying the following few calls:

./game_of_parsing -H -s 2 -s 4 data.tsv
5.32
./game_of_parsing -H -s 1 --season 3 -s 7 --aggregate-by median data.tsv
4.97

Solution

struct cmd_arguments
{
std::filesystem::path file_path{};
std::string aggregate_by{"mean"};
bool header_is_set{};
};
void initialize_argument_parser(argument_parser & parser, cmd_arguments & args)
{
parser.info.author = "Cercei";
parser.info.short_description = "Aggregate average Game of Thrones viewers by season.";
parser.info.version = "1.0.0";
parser.add_positional_option(args.file_path, "Please provide a tab separated data file.");
parser.add_option(args.seasons, 's', "season", "Choose the seasons to aggregate.");
parser.add_option(args.aggregate_by, 'a', "aggregate-by", "Choose your method of aggregation: mean or median.");
parser.add_flag(args.header_is_set, 'H', "header-is-set", "Let us know whether your data file contains a "
"header to ensure correct parsing.");
}

In case you are stuck, the complete program now looks like this:

#include <fstream>
#include <numeric>
#include <range/v3/view/split.hpp>
#include <seqan3/argument_parser/all.hpp> // includes all necessary headers
#include <seqan3/core/debug_stream.hpp> // our custom output stream
#include <seqan3/std/charconv> // includes std::from_chars
#include <seqan3/std/filesystem> // use std::filesystem::path
using namespace seqan3;
// This is the program!
// Take a look at it if you are interested in an example of parsing a data file.
// -----------------------------------------------------------------------------
template <typename number_type, typename range_type>
number_type to_number(range_type && range)
{
number_type num;
auto res = std::from_chars(&str[0], &str[0] + str.size(), num);
if (res.ec != std::errc{})
{
debug_stream << "Could not cast '" << range << "' to a valid number\n";
throw std::invalid_argument{"CAST ERROR"};
}
return num;
}
void run_program(std::filesystem::path & path, std::vector<uint8_t> sn, std::string & aggr_by, bool hd_is_set)
{
std::ifstream file{path.string()};
if (file.is_open())
{
if (hd_is_set)
std::getline(file, line); // ignore first line
while (std::getline(file, line))
{
auto splitted_line = line | std::views::split('\t');
auto it = splitted_line.begin(); // move to 1rst column
if (std::find(sn.begin(), sn.end(), to_number<uint8_t>(*it)) != sn.end())
v.push_back(to_number<double>(*std::next(it, 4)));
}
if (aggr_by == "median")
debug_stream << ([&v] () { std::sort(v.begin(), v.end()); return v[v.size()/2]; })() << std::endl;
else if (aggr_by == "mean")
debug_stream << ([&v] () { double sum{}; for (auto i : v) sum += i; return sum / v.size(); })() << std::endl;
else
debug_stream << "I do not know the aggregation method " << aggr_by << std::endl;
}
else
{
debug_stream << "Error: Cannot open file for reading.\n";
}
}
// -----------------------------------------------------------------------------
struct cmd_arguments
{
std::filesystem::path file_path{};
std::string aggregate_by{"mean"};
bool header_is_set{};
};
void initialize_argument_parser(argument_parser & parser, cmd_arguments & args)
{
parser.info.author = "Cercei";
parser.info.short_description = "Aggregate average Game of Thrones viewers by season.";
parser.info.version = "1.0.0";
parser.add_positional_option(args.file_path, "Please provide a tab separated data file.");
parser.add_option(args.seasons, 's', "season", "Choose the seasons to aggregate.");
parser.add_option(args.aggregate_by, 'a', "aggregate-by", "Choose your method of aggregation: mean or median.");
parser.add_flag(args.header_is_set, 'H', "header-is-set", "Let us know whether your data file contains a "
"header to ensure correct parsing.");
}
int main(int argc, char ** argv)
{
argument_parser myparser{"Game-of-Parsing", argc, argv}; // initialise myparser
cmd_arguments args{};
initialize_argument_parser(myparser, args);
try
{
myparser.parse(); // trigger command line parsing
}
catch (parser_invalid_argument const & ext) // catch user errors
{
debug_stream << "[Winter has come] " << ext.what() << "\n"; // customise your error message
return -1;
}
// parsing was successful !
// we can start running our program
run_program(args.file_path, args.seasons, args.aggregate_by, args.header_is_set);
return 0;
}

Setting options as required, advanced or hidden

Required options

There is a flaw in the example application we have programmed in assignment 4, did you notice? You can make it misbehave by not giving it any option -s (which is technically correct for the seqan3::argument_parser because a list may be empty). You could of course handle this in the program itself by checking whether the vector seasons is empty, but since supplying no season is not expected we can force the user to supply the option at least once by declaring an option as required.

For this purpose we need to use the seqan3::option_spec enum interface that is accepted as an additional argument by all of the add_[positional_option/option/flag] calls:

std::string required_variable{};
parser.add_option(required_variable, 'n', "name", "I really need a name.", option_spec::REQUIRED);

If the user does not supply the required option via the command line, he will now get the following error:

./example_program --some-other-option
Option -n/--name is required but not set.
Note
Positional options are always required!

Advanced and hidden options

Additionally to the required tag, there is also the possibility of declaring an option as advanced or hidden.

Set an option/flag to advanced, if you do not want the option to be displayed in the normal help page (-h/--help). Instead, the advanced options are only displayed when calling -hh/--advanced-help. This can be helpful, if you want to avoid to bloat your help page with too much information for inexperienced users of your application, but still provide thorough information on demand.

Set an option/flag to hidden, if you want to completely hide it from the user. It will neither appear on the help page nor in any export format. For example, this might be useful for debugging reasons.

Summary:

Tag Description
DEFAULT The default tag with no special behaviour.
REQUIRED Required options will cause an error if not provided.
ADVANCED Advanced options are only displayed wit -hh/--advanced-help.
HIDDEN Hidden options are never displayed when exported.

Assignment 5

Extend the solution from assignment 4 by declaring the -s/--season option as required.

Check if your options are set correctly by trying the following call:

./game_of_parsing -H --aggregate-by median data.tsv
[Winter has come] Option -s/--season is required but not set.

Solution

parser.add_option(args.seasons, 's', "season", "Choose the seasons to aggregate.", option_spec::REQUIRED);

In case you are stuck, the complete program now looks like this:

#include <fstream>
#include <numeric>
#include <range/v3/view/split.hpp>
#include <seqan3/argument_parser/all.hpp> // includes all necessary headers
#include <seqan3/core/debug_stream.hpp> // our custom output stream
#include <seqan3/std/charconv> // includes std::from_chars
#include <seqan3/std/filesystem> // use std::filesystem::path
using namespace seqan3;
// This is the program!
// Take a look at it if you are interested in an example of parsing a data file.
// -----------------------------------------------------------------------------
template <typename number_type, typename range_type>
number_type to_number(range_type && range)
{
number_type num;
auto res = std::from_chars(&str[0], &str[0] + str.size(), num);
if (res.ec != std::errc{})
{
debug_stream << "Could not cast '" << range << "' to a valid number\n";
throw std::invalid_argument{"CAST ERROR"};
}
return num;
}
void run_program(std::filesystem::path & path, std::vector<uint8_t> sn, std::string & aggr_by, bool hd_is_set)
{
std::ifstream file{path.string()};
if (file.is_open())
{
if (hd_is_set)
std::getline(file, line); // ignore first line
while (std::getline(file, line))
{
auto splitted_line = line | std::views::split('\t');
auto it = splitted_line.begin(); // move to 1rst column
if (std::find(sn.begin(), sn.end(), to_number<uint8_t>(*it)) != sn.end())
v.push_back(to_number<double>(*std::next(it, 4)));
}
if (aggr_by == "median")
debug_stream << ([&v] () { std::sort(v.begin(), v.end()); return v[v.size()/2]; })() << std::endl;
else if (aggr_by == "mean")
debug_stream << ([&v] () { double sum{}; for (auto i : v) sum += i; return sum / v.size(); })() << std::endl;
else
debug_stream << "I do not know the aggregation method " << aggr_by << std::endl;
}
else
{
debug_stream << "Error: Cannot open file for reading.\n";
}
}
// -----------------------------------------------------------------------------
struct cmd_arguments
{
std::filesystem::path file_path{};
std::string aggregate_by{"mean"};
bool header_is_set{};
};
void initialize_argument_parser(argument_parser & parser, cmd_arguments & args)
{
parser.info.author = "Cercei";
parser.info.short_description = "Aggregate average Game of Thrones viewers by season.";
parser.info.version = "1.0.0";
parser.add_positional_option(args.file_path, "Please provide a tab separated data file.");
parser.add_option(args.seasons, 's', "season", "Choose the seasons to aggregate.", option_spec::REQUIRED);
parser.add_option(args.aggregate_by, 'a', "aggregate-by", "Choose your method of aggregation: mean or median.");
parser.add_flag(args.header_is_set, 'H', "header-is-set", "Let us know whether your data file contains a "
"header to ensure correct parsing.");
}
int main(int argc, char ** argv)
{
argument_parser myparser{"Game-of-Parsing", argc, argv}; // initialise myparser
cmd_arguments args{};
initialize_argument_parser(myparser, args);
try
{
myparser.parse(); // trigger command line parsing
}
catch (parser_invalid_argument const & ext) // catch user errors
{
debug_stream << "[Winter has come] " << ext.what() << "\n"; // customise your error message
return -1;
}
// parsing was successful !
// we can start running our program
run_program(args.file_path, args.seasons, args.aggregate_by, args.header_is_set);
return 0;
}

Validation of (positional) option values

Our applications often do not allow just any value to be passed as input arguments and if we do not check for them, the program may run into undefined behaviour. The best way to carefully restrict user input is to directly check the input when parsing the command line. The seqan3::argument_parser provides validators for a given (positional) option.

A validator is a functor that is called within the argument parser after retrieving and converting a command line argument. We provide several validators, which we hope cover most of the use cases, but you can always create your own validator (see section Create your own validator).

Attention
You can pass a validator to the seqan3::argument_parser::add_option function only after passing the seqan3::option_spec parameter. Pass the seqan3::option_spec::DEFAULT tag, if there are no further restrictions on your option.

SeqAn3 validators

The following validators are provided in the SeqAn3 library and can be included with the following header:

All the validators below work on single values or a container of values. In case the variable is a container, the validator is called on each element separately.

Note
If the validators below do not suit your needs, you can always create your own validator. See the concept tutorial for an example of how to create your own validator.

The seqan3::arithmetic_range_validator

On construction, this validator receives a maximum and a minimum number. The validator throws a seqan3::parser_invalid_argument exception whenever a given value does not lie inside the given min/max range.

int myint;
myparser.add_option(myint,'i',"integer","Give me a number.",

Our application has a another flaw that you might have noticed by now: If you supply a season that is not in the data file, the program will again misbehave. Instead of fixing the program, let's restrict the user input accordingly.

Assignment 6

Add a seqan3::arithmetic_range_validator to the -s/--season option that sets the range to [1,7].
Solution

parser.add_option(args.seasons, 's', "season", "Choose the seasons to aggregate.",
option_spec::REQUIRED, arithmetic_range_validator{1, 7});

The seqan3::value_list_validator

On construction, the validator receives a list (vector) of valid values. The validator throws a seqan3::parser_invalid_argument exception whenever a given value is not in the given list.

int myint;
seqan3::value_list_validator my_validator{{2, 4, 6, 8, 10}};
myparser.add_option(myint,'i',"integer","Give me a number.",

Assignment 7

Add a seqan3::value_list_validator to the -a/--aggregate-by option that sets the list of valid values to ["median", "mean"].
Solution

parser.add_option(args.aggregate_by, 'a', "aggregate-by", "Choose your method of aggregation.",

The file validator

SeqAn3 offers two file validator types: the seqan3::input_file_validator and the seqan3::output_file_validator. On construction, the validator receives a list (vector) of valid file extensions that are tested against the extension of the parsed option value. The validator throws a seqan3::parser_invalid_argument exception whenever a given filename's extension is not in the given list of valid extensions. In addition, the seqan3::input_file_validator checks if the file exists, is a regular file and is readable. The seqan3::output_file_validator on the other hand ensures that the output does not already exist (in order to prevent overwriting an already existing file) and that it can be created.

Note
If you want to allow any extension just use a default constructed file validator.

Using the seqan3::input_file_validator:

myparser.add_option(myfile,'f',"file","The input file containing the sequences.",

Using the seqan3::output_file_validator:

myparser.add_option(myfile,'f',"file","Output file containing the processed sequences.",

The directory validator

In addition to the file validator types, SeqAn3 offers directory validator types. These are useful if one needs to provide an input directory (using the seqan3::input_directory_validator) or output directory (using the seqan3::output_directory_validator) where multiple files need to be read from or written to. The seqan3::input_directory_validator checks whether the specified path is a directory and is readable. Similarly, the seqan3::output_directory_validator checks whether the specified directory is writable and can be created, if it does not already exists. If the tests fail, a seqan3::parser_invalid_argument exception will be thrown. Also, if something unexpected with the filesystem happens, a std::filesystem_error will be thrown.

Using the seqan3::input_directory_validator:

myparser.add_option(mydir, 'd', "dir", "The directory containing the input files.",

Using the seqan3::output_directory_validator:

myparser.add_option(mydir, 'd', "dir", "The output directory for storing the files.",

Assignment 8

Add a validator to the first positional option that expects a file formatted with tab separated values. Store the result in file_path.
Solution

parser.add_positional_option(args.file_path, "Please provide a tab separated data file.",
input_file_validator{{"tsv"}});

The seqan3::regex_validator

On construction, the validator receives a pattern for a regular expression. The pattern variable will be used for constructing an std::regex and the validator will call std::regex_match on the command line argument.

Note that a regex_match will only return true if the string matches the pattern completely (in contrast to regex_search which also matches substrings). The validator throws a seqan3::parser_invalid_argument exception whenever a given parameter does not match the given regular expression.

std::string my_string;
seqan3::regex_validator my_validator{"[a-zA-Z]+@[a-zA-Z]+\\.com"};
myparser.add_option(my_string,'s',"str","Give me a string.",

Chaining validators

You can also chain validators using the pipe operator (|). The pipe operator is the AND operation for two validators, which means that a value must pass both validators in order to be accepted by the combined validator.

For example, you may want a file name that only accepts absolute paths, but also must have one out of a list of given file extensions. For this purpose you can chain a seqan3::regex_validator to a seqan3::input_file_validator:

std::string file_name;
seqan3::regex_validator absolute_path_validator{"(/[^/]+)+/.*\\.[^/\\.]+$"};
seqan3::input_file_validator my_file_ext_validator{{"sa", "so"}};
myparser.add_option(file_name, 'f', "file","Give me a file name with an absolute path.",
seqan3::option_spec::DEFAULT, absolute_path_validator | my_file_ext_validator);

You can chain as many validators as you want, they will be evaluated one after the other from left to right (first to last).

Assignment 9

Add a seqan3::regex_validator to the first positional option that expects the file_path by chaining it to the already present seqan3::input_file_validator. The parsed file name should have a suffix called seasons.
Solution

parser.add_positional_option(args.file_path, "Please provide a tab separated seasons file.",
regex_validator{".*seasons\\..+$"} | input_file_validator{{"tsv"}} );

Full solution

The following solution shows the complete code including all the little assignments of this tutorial that can serve as a copy'n'paste source for your application.

Solution

#include <fstream>
#include <numeric>
#include <range/v3/view/split.hpp>
#include <seqan3/argument_parser/all.hpp> // includes all necessary headers
#include <seqan3/core/debug_stream.hpp> // our custom output stream
#include <seqan3/std/charconv> // includes std::from_chars
#include <seqan3/std/filesystem> // use std::filesystem::path
using namespace seqan3;
// This is the program!
// Take a look at it if you are interested in an example of parsing a data file.
// -----------------------------------------------------------------------------
template <typename number_type, typename range_type>
number_type to_number(range_type && range)
{
number_type num;
auto res = std::from_chars(&str[0], &str[0] + str.size(), num);
if (res.ec != std::errc{})
{
debug_stream << "Could not cast '" << range << "' to a valid number\n";
throw std::invalid_argument{"CAST ERROR"};
}
return num;
}
void run_program(std::filesystem::path & path, std::vector<uint8_t> sn, std::string & aggr_by, bool hd_is_set)
{
std::ifstream file{path.string()};
if (file.is_open())
{
if (hd_is_set)
std::getline(file, line); // ignore first line
while (std::getline(file, line))
{
auto splitted_line = line | std::views::split('\t');
auto it = splitted_line.begin(); // move to 1rst column
if (std::find(sn.begin(), sn.end(), to_number<uint8_t>(*it)) != sn.end())
v.push_back(to_number<double>(*std::next(it, 4)));
}
if (aggr_by == "median")
debug_stream << ([&v] () { std::sort(v.begin(), v.end()); return v[v.size()/2]; })() << std::endl;
else if (aggr_by == "mean")
debug_stream << ([&v] () { double sum{}; for (auto i : v) sum += i; return sum / v.size(); })() << std::endl;
else
debug_stream << "I do not know the aggregation method " << aggr_by << std::endl;
}
else
{
debug_stream << "Error: Cannot open file for reading.\n";
}
}
// -----------------------------------------------------------------------------
struct cmd_arguments
{
std::filesystem::path file_path{};
std::string aggregate_by{"mean"};
bool header_is_set{};
};
void initialize_argument_parser(argument_parser & parser, cmd_arguments & args)
{
parser.info.author = "Cercei";
parser.info.short_description = "Aggregate average Game of Thrones viewers by season.";
parser.info.version = "1.0.0";
parser.add_positional_option(args.file_path, "Please provide a tab separated seasons file.",
regex_validator{".*seasons\\..+$"} | input_file_validator{{"tsv"}} );
parser.add_option(args.seasons, 's', "season", "Choose the seasons to aggregate.",
parser.add_option(args.aggregate_by, 'a', "aggregate-by", "Choose your method of aggregation.",
parser.add_flag(args.header_is_set, 'H', "header-is-set", "Let us know whether your data file contains a "
"header to ensure correct parsing.");
}
int main(int argc, char ** argv)
{
argument_parser myparser{"Game-of-Parsing", argc, argv}; // initialise myparser
cmd_arguments args{};
initialize_argument_parser(myparser, args);
try
{
myparser.parse(); // trigger command line parsing
}
catch (parser_invalid_argument const & ext) // catch user errors
{
debug_stream << "[Winter has come] " << ext.what() << "\n"; // customise your error message
return -1;
}
// parsing was successful !
// we can start running our program
run_program(args.file_path, args.seasons, args.aggregate_by, args.header_is_set);
return 0;
}

Subcommand argument parsing

Many applications provide several sub programs, e.g. git comes with many functionalities like git push, git pull, git checkout, etc. each having their own help page. If you are interested in how this subcommand parsing can be done with the seqan3::argument_parser, take a look at our HowTo.

Update Notifications

When you run a SeqAn-based application for the first time, you will likely be asked about "update notifications". This is a feature that helps inform users about updates and helps the SeqAn project get a rough estimate on which SeqAn-based apps are popular.

See the API documentation of seqan3::argument_parser for information on how to configure (or turn off) this feature. See our wiki entry for more information on how it works and our privacy policy.