SeqAn3 3.4.0-rc.1
The Modern C++ library for sequence analysis.
Loading...
Searching...
No Matches
Builtin Character Operations

Provides various operations on character types. More...

+ Collaboration diagram for Builtin Character Operations:

Classes

interface  char_predicate
 An internal concept to check if an object fulfills the requirements of a seqan3::detail::char_predicate. More...
 
struct  seqan3::detail::char_predicate_base< derived_t >
 An abstract CRTP base class for parse conditions to add logical disjunction and negation operator. More...
 
struct  seqan3::detail::char_predicate_disjunction< condition_ts >
 Logical disjunction operator for parse conditions. More...
 
struct  seqan3::detail::char_predicate_negator< condition_t >
 Logical not operator for a parse condition. More...
 
class  seqan3::detail::constexpr_pseudo_bitset< N >
 A data structure that implements a subset of std::bitset as constexpr. More...
 
struct  seqan3::detail::is_char_type< char_v >
 Parse condition that checks if a given value is equal to char_v. More...
 
struct  seqan3::detail::is_in_interval_type< interval_first, interval_last >
 Parse condition that checks if a given value is in the range of rng_beg and interval_last. More...
 

Functions

std::string seqan3::detail::make_printable (char const c)
 Returns a printable value for the given character c.
 

Variables

template<char op, typename condition_head_t , typename... condition_ts>
const std::string condition_message_v
 Defines a compound std::string consisting of all given conditions separated by the operator-name op.
 
template<typename char_type >
constexpr std::array< char_type, detail::size_in_values_v< char_type > > seqan3::detail::to_lower_table
 Auxiliary table for seqan3::to_lower.
 
template<typename char_type >
constexpr std::array< char_type, detail::size_in_values_v< char_type > > seqan3::detail::to_upper_table
 Auxiliary table for seqan3::to_upper.
 

Char predicates

Char predicates are function like objects that can be used to check if a character c fulfills certain constraints. SeqAn implements all predicates also available in the standard library and some more.

Disjunction and Negation

In contrast to the standard library (where the checks are implemented as functions), the functors in SeqAn can be joined efficiently, maintaining constant-time evaluation independent of the number of checks. Functors can be combined with the ||-operator or negated via the !-operator:

// SPDX-FileCopyrightText: 2006-2024 Knut Reinert & Freie Universität Berlin
// SPDX-FileCopyrightText: 2016-2024 Knut Reinert & MPI für molekulare Genetik
// SPDX-License-Identifier: CC0-1.0
#include <iostream>
int main()
{
char chr{'1'};
constexpr auto my_cond = seqan3::is_char<'%'> || seqan3::is_digit;
bool is_percent = my_cond(chr);
std::cout << std::boolalpha << is_percent << '\n'; // true
}
T boolalpha(T... args)
constexpr auto is_digit
Checks whether c is a digital character.
Definition predicate.hpp:259
constexpr auto is_char
Checks whether a given letter is the same as the template non-type argument.
Definition predicate.hpp:60
Provides character predicates for tokenisation.

Defining complex combinations and using them in e.g. input/output can increase speed significantly over checking multiple functions: we measured speed-ups of 10x for a single check and speed-ups of over 20x for complex combinations.

Custom predicates

Standard library predicates

SeqAn offers the 12 predicates exactly as defined in the standard library except that we have introduced an underscore in the name to be consistent with our other naming.

The following table lists the predefined character predicates and which constraints are associated with them.

ASCII values characters

is_cntrl

is_print

is_space

is_blank

is_graph

is_punct

is_alnum

is_alpha

is_upper

is_lower

is_digit

is_xdigit

decimal hexadecimal octal
0–8 \x0\x8 \0\10 control codes (NUL, etc.) ≠0 0 0 0 0 0 0 0 0 0 0 0
9 \x9 \11 tab (\t) ≠0 0 ≠0 ≠0 0 0 0 0 0 0 0 0
10–13 \xA\xD \12\15 whitespaces (\n, \v, \f, \r) ≠0 0 ≠0 0 0 0 0 0 0 0 0 0
14–31 \xE\x1F \16\37 control codes ≠0 0 0 0 0 0 0 0 0 0 0 0
32 \x20 \40 space 0 ≠0 ≠0 ≠0 0 0 0 0 0 0 0 0
33–47 \x21\x2F \41\57 !"#$%&'()*+,-./ 0 ≠0 0 0 ≠0 ≠0 0 0 0 0 0 0
48–57 \x30\x39 \60\71 0123456789 0 ≠0 0 0 ≠0 0 ≠0 0 0 0 ≠0 ≠0
58–64 \x3A\x40 \72\100 :;<=>?@ 0 ≠0 0 0 ≠0 ≠0 0 0 0 0 0 0
65–70 \x41\x46 \101\106 ABCDEF 0 ≠0 0 0 ≠0 0 ≠0 ≠0 ≠0 0 0 ≠0
71–90 \x47\x5A \107\132 GHIJKLMNOP
QRSTUVWXYZ
0 ≠0 0 0 ≠0 0 ≠0 ≠0 ≠0 0 0 0
91–96 \x5B\x60 \133\140 []^_` 0 ≠0 0 0 ≠0 ≠0 0 0 0 0 0 0
97–102 \x61\x66 \141\146 abcdef 0 ≠0 0 0 ≠0 0 ≠0 ≠0 0 ≠0 0 ≠0
103–122 \x67\x7A \147\172 ghijklmnop
qrstuvwxyz
0 ≠0 0 0 ≠0 0 ≠0 ≠0 0 ≠0 0 0
123–126 \x7B\x7E \172\176 {|}~ 0 ≠0 0 0 ≠0 ≠0 0 0 0 0 0 0
127 \x7F \177 backspace character (DEL) ≠0 0 0 0 0 0 0 0 0 0 0 0


template<uint8_t interval_first, uint8_t interval_last>
constexpr auto seqan3::is_in_interval
 Checks whether a given letter is in the specified interval.
 
template<int char_v>
constexpr auto seqan3::is_char
 Checks whether a given letter is the same as the template non-type argument.
 
constexpr auto seqan3::is_cntrl = is_in_interval<'\0', static_cast<char>(31)> || is_char<static_cast<char>(127)>
 Checks whether c is a control character.
 
constexpr auto seqan3::is_print = is_in_interval<' ', '~'>
 Checks whether c is a printable character.
 
constexpr auto seqan3::is_space = is_in_interval<'\t', '\r'> || is_char<' '>
 Checks whether c is a space character.
 
constexpr auto seqan3::is_blank = is_char<'\t'> || is_char<' '>
 Checks whether c is a blank character.
 
constexpr auto seqan3::is_graph = is_in_interval<'!', '~'>
 Checks whether c is a graphic character.
 
constexpr auto seqan3::is_punct
 Checks whether c is a punctuation character.
 
constexpr auto seqan3::is_alnum = is_in_interval<'0', '9'> || is_in_interval<'A', 'Z'> || is_in_interval<'a', 'z'>
 Checks whether c is a alphanumeric character.
 
constexpr auto seqan3::is_alpha = is_in_interval<'A', 'Z'> || is_in_interval<'a', 'z'>
 Checks whether c is a alphabetical character.
 
constexpr auto seqan3::is_upper = is_in_interval<'A', 'Z'>
 Checks whether c is a upper case character.
 
constexpr auto seqan3::is_lower = is_in_interval<'a', 'z'>
 Checks whether c is a lower case character.
 
constexpr auto seqan3::is_digit = is_in_interval<'0', '9'>
 Checks whether c is a digital character.
 
constexpr auto seqan3::is_xdigit = is_in_interval<'0', '9'> || is_in_interval<'A', 'F'> || is_in_interval<'a', 'f'>
 Checks whether c is a hexadecimal character.
 
constexpr auto seqan3::is_eof = is_char<EOF>
 Checks whether a given letter is equal to the EOF constant defined in <cstdio>.
 

Detailed Description

Provides various operations on character types.

See also
Utility

Function Documentation

◆ make_printable()

std::string seqan3::detail::make_printable ( char const  c)
inline

Returns a printable value for the given character c.

Parameters
[in]cThe character to be represented as printable string.
Returns
a std::string containing a printable version of the given character c.

Some characters, e.g. control commands, cannot be printed. This function converts them to a std::string containing the visual representation of this character. For all control commands the value 'CTRL' is returned.

Exception

Strong exception guarantee is given.

Complexity

Constant.

Concurrency

Thread-safe.

Variable Documentation

◆ condition_message_v

template<char op, typename condition_head_t , typename... condition_ts>
const std::string condition_message_v
related
Initial value:
{
+ (condition_head_t::msg + ... + (std::string{" "} + std::string{{op, op}} + std::string{" "} + condition_ts::msg))
+ std::string{")"}}

Defines a compound std::string consisting of all given conditions separated by the operator-name op.

Template Parameters
opnon-type template parameter specifying the separator character, e.g. '|'.
condition_head_tThe first condition type in the message. Ensures that there is at least one type.
condition_tsRemaining list of conditions separated by op.

◆ is_alnum

constexpr auto seqan3::is_alnum = is_in_interval<'0', '9'> || is_in_interval<'A', 'Z'> || is_in_interval<'a', 'z'>
inlineconstexpr

Checks whether c is a alphanumeric character.

This function like object can be used to check if a character c is a alphanumeric character. For the standard ASCII character set, the following characters are alphanumeric characters:

  • digits (0123456789)
  • uppercase letters (ABCDEFGHIJKLMNOPQRSTUVWXYZ)
  • lowercase letters (abcdefghijklmnopqrstuvwxyz)

Example

static_assert(seqan3::is_alnum('9'));
constexpr auto is_alnum
Checks whether c is a alphanumeric character.
Definition predicate.hpp:194

◆ is_alpha

constexpr auto seqan3::is_alpha = is_in_interval<'A', 'Z'> || is_in_interval<'a', 'z'>
inlineconstexpr

Checks whether c is a alphabetical character.

This function like object can be used to check if a character c is a alphabetical character. For the standard ASCII character set, the following characters are alphabetical characters:

  • uppercase letters (ABCDEFGHIJKLMNOPQRSTUVWXYZ)
  • lowercase letters (abcdefghijklmnopqrstuvwxyz)

Example

static_assert(seqan3::is_alpha('z'));
constexpr auto is_alpha
Checks whether c is a alphabetical character.
Definition predicate.hpp:211

◆ is_blank

constexpr auto seqan3::is_blank = is_char<'\t'> || is_char<' '>
inlineconstexpr

Checks whether c is a blank character.

This function like object can be used to check if a character c is a blank character. For the standard ASCII character set, the following characters are blank characters:

  • horizontal tab ('\t')
  • space (' ')

Example

static_assert(seqan3::is_blank('\t'));
constexpr auto is_blank
Checks whether c is a blank character.
Definition predicate.hpp:139

◆ is_char

template<int char_v>
constexpr auto seqan3::is_char
inlineconstexpr

Checks whether a given letter is the same as the template non-type argument.

Template Parameters
char_vThe letter to compare against.

This function like object returns true if the argument is the same as the template argument, false otherwise.

Example

// SPDX-FileCopyrightText: 2006-2024 Knut Reinert & Freie Universität Berlin
// SPDX-FileCopyrightText: 2016-2024 Knut Reinert & MPI für molekulare Genetik
// SPDX-License-Identifier: CC0-1.0
int main()
{
seqan3::is_char<'C'>('C'); // returns true
constexpr auto my_check = seqan3::is_char<'C'>;
my_check('c'); // returns false, because case is different
}

◆ is_cntrl

constexpr auto seqan3::is_cntrl = is_in_interval<'\0', static_cast<char>(31)> || is_char<static_cast<char>(127)>
inlineconstexpr

Checks whether c is a control character.

This function like object can be used to check if a character c is a control character. For the standard ASCII character set, control characters are those between ASCII codes 0x00 (NUL) and 0x1f (US) and 0x7f (DEL).

Example

static_assert(seqan3::is_cntrl('\0'));
constexpr auto is_cntrl
Checks whether c is a control character.
Definition predicate.hpp:87

◆ is_digit

constexpr auto seqan3::is_digit = is_in_interval<'0', '9'>
inlineconstexpr

Checks whether c is a digital character.

This function like object can be used to check if a character c is a digital character. For the standard ASCII character set, the following characters are digital characters:

  • digits (0123456789)

Example

static_assert(seqan3::is_digit('1'));

◆ is_eof

constexpr auto seqan3::is_eof = is_char<EOF>
inlineconstexpr

Checks whether a given letter is equal to the EOF constant defined in <cstdio>.

This function like object returns true if the argument is equal to EOF, false otherwise.

Example

static_assert(seqan3::is_eof(EOF));
static_assert(!seqan3::is_eof('C'));
constexpr auto is_eof
Checks whether a given letter is equal to the EOF constant defined in <cstdio>.
Definition predicate.hpp:72

◆ is_graph

constexpr auto seqan3::is_graph = is_in_interval<'!', '~'>
inlineconstexpr

Checks whether c is a graphic character.

This function like object can be used to check if a character c is a graphic (has a graphical representation) character. For the standard ASCII character set, the following characters are graphic characters:

  • digits (0123456789)
  • uppercase letters (ABCDEFGHIJKLMNOPQRSTUVWXYZ)
  • lowercase letters (abcdefghijklmnopqrstuvwxyz)
  • punctuation characters (!"#$%&'()*+,-./:;<=>?@[]^_`{|}~)

Example

static_assert(seqan3::is_graph('%'));
constexpr auto is_graph
Checks whether c is a graphic character.
Definition predicate.hpp:159

◆ is_in_interval

template<uint8_t interval_first, uint8_t interval_last>
constexpr auto seqan3::is_in_interval
inlineconstexpr

Checks whether a given letter is in the specified interval.

Template Parameters
interval_firstThe first character for which to return true.
interval_lastThe last character (inclusive) for which to return true.

This function like object returns true for all characters in the given range, false otherwise.

Example

// SPDX-FileCopyrightText: 2006-2024 Knut Reinert & Freie Universität Berlin
// SPDX-FileCopyrightText: 2016-2024 Knut Reinert & MPI für molekulare Genetik
// SPDX-License-Identifier: CC0-1.0
int main()
{
seqan3::is_in_interval<'A', 'G'>('C'); // returns true
constexpr auto my_check = seqan3::is_in_interval<'A', 'G'>;
my_check('H'); // returns false
}
constexpr auto is_in_interval
Checks whether a given letter is in the specified interval.
Definition predicate.hpp:44

◆ is_lower

constexpr auto seqan3::is_lower = is_in_interval<'a', 'z'>
inlineconstexpr

Checks whether c is a lower case character.

This function like object can be used to check if a character c is a lower case character. For the standard ASCII character set, the following characters are lower case characters:

  • lowercase letters (abcdefghijklmnopqrstuvwxyz)

Example

static_assert(seqan3::is_lower('a'));
constexpr auto is_lower
Checks whether c is a lower case character.
Definition predicate.hpp:243

◆ is_print

constexpr auto seqan3::is_print = is_in_interval<' ', '~'>
inlineconstexpr

Checks whether c is a printable character.

This function like object can be used to check if a character c is a printable character. For the standard ASCII character set, printable characters are those between ASCII codes 0x20 (space) and 0x7E (~).

Example

static_assert(seqan3::is_print(' '));
constexpr auto is_print
Checks whether c is a printable character.
Definition predicate.hpp:101

◆ is_punct

constexpr auto seqan3::is_punct
inlineconstexpr
Initial value:
=
is_in_interval<'!', '/'> || is_in_interval<':', '@'> || is_in_interval<'[', '`'> || is_in_interval<'{', '~'>

Checks whether c is a punctuation character.

This function like object can be used to check if a character c is a punctuation character. For the standard ASCII character set, the following characters are punctuation characters:

  • punctuation characters (!"#$%&'()*+,-./:;<=>?@[]^_`{|}~)

Example

static_assert(seqan3::is_punct(':'));
constexpr auto is_punct
Checks whether c is a punctuation character.
Definition predicate.hpp:175

◆ is_space

constexpr auto seqan3::is_space = is_in_interval<'\t', '\r'> || is_char<' '>
inlineconstexpr

Checks whether c is a space character.

This function like object can be used to check if a character c is a space character. For the standard ASCII character set, the following characters are space characters:

  • horizontal tab ('\t')
  • line feed ('\n')
  • vertical tab ('\v')
  • from feed ('\f')
  • carriage return ('\r')
  • space (' ')

Example

static_assert(seqan3::is_space('\n'));
constexpr auto is_space
Checks whether c is a space character.
Definition predicate.hpp:122

◆ is_upper

constexpr auto seqan3::is_upper = is_in_interval<'A', 'Z'>
inlineconstexpr

Checks whether c is a upper case character.

This function like object can be used to check if a character c is a upper case character. For the standard ASCII character set, the following characters are upper case characters:

  • uppercase letters (ABCDEFGHIJKLMNOPQRSTUVWXYZ)

Example

static_assert(seqan3::is_upper('K'));
constexpr auto is_upper
Checks whether c is a upper case character.
Definition predicate.hpp:227

◆ is_xdigit

constexpr auto seqan3::is_xdigit = is_in_interval<'0', '9'> || is_in_interval<'A', 'F'> || is_in_interval<'a', 'f'>
inlineconstexpr

Checks whether c is a hexadecimal character.

This function like object can be used to check if a character c is a hexadecimal character. For the standard ASCII character set, the following characters are hexadecimal characters:

  • digits (0123456789)
  • uppercase letters (ABCDEF)
  • lowercase letters (abcdef)

Example

static_assert(seqan3::is_xdigit('e'));
constexpr auto is_xdigit
Checks whether c is a hexadecimal character.
Definition predicate.hpp:277

◆ to_lower_table

template<typename char_type >
constexpr std::array<char_type, detail::size_in_values_v<char_type> > seqan3::detail::to_lower_table
inlineconstexpr
Initial value:
{
[]() constexpr
{
for (size_t i = 0; i < detail::size_in_values_v<char_type>; ++i)
ret[i] = i;
for (size_t i = char_type{'A'}; i <= char_type{'Z'}; ++i)
ret[i] = ret[i] - char_type{'A'} + char_type{'a'};
return ret;
}()
}

Auxiliary table for seqan3::to_lower.

◆ to_upper_table

template<typename char_type >
constexpr std::array<char_type, detail::size_in_values_v<char_type> > seqan3::detail::to_upper_table
inlineconstexpr
Initial value:
{
[]() constexpr
{
for (size_t i = 0; i < detail::size_in_values_v<char_type>; ++i)
ret[i] = i;
for (size_t i = char_type{'a'}; i <= char_type{'z'}; ++i)
ret[i] = ret[i] - char_type{'a'} + char_type{'A'};
return ret;
}()
}

Auxiliary table for seqan3::to_upper.

Hide me