regcomp() – Compiling Regexps
The regcomp() functions
#include <tre/tre.h> int tre_regcomp(regex_t *preg, const char *regex, int cflags); int tre_regncomp(regex_t *preg, const char *regex, size_t len, int cflags); int tre_regwcomp(regex_t *preg, const wchar_t *regex, int cflags); int tre_regwncomp(regex_t *preg, const wchar_t *regex, size_t len, int cflags); void tre_regfree(regex_t *preg);
The regcomp()
function compiles the regex string pointed to by regex
to an internal representation and stores the result in the pattern buffer structure pointed to by preg. The regncomp()
function is like regcomp()
, but regex
is not terminated with the null byte. Instead, the len
argument is used to give the length of the string, and the string may contain null bytes. The regwcomp()
and regwncomp()
functions work like regcomp()
and regncomp()
, respectively, but take a wide character (wchar_t
) string instead of a byte string.
The cflags
argument is a the bitwise inclusive OR of zero or more of the following flags (defined in the header <tre/regex.h>
):
- REG_EXTENDED
- Use POSIX Extended Regular Expression (ERE) compatible syntax when compiling
regex
. The default syntax is the POSIX Basic Regular Expression (BRE) syntax, but it is considered obsolete. - REG_ICASE
- Ignore case. Subsequent searches with the regexec family of functions using this pattern buffer will be case insensitive.
- REG_NOSUB
- Do not report submatches. Subsequent searches with the regexec family of functions will only report whether a match was found or not and will not fill the submatch array.
- REG_NEWLINE
- Normally the newline character is treated as an ordinary character. When this flag is used, the newline character (
'\n'
, ASCII code 10) is treated specially as follows:- The match-any-character operator (dot
"."
outside a bracket expression) does not match a newline. - A non-matching list (
[^...]
) not containing a newline does not match a newline. - The match-beginning-of-line operator ^ matches the empty string immediately after a newline as well as the empty string at the beginning of the string (but see the
REG_NOTBOL
regexec()
flag below). - The match-end-of-line operator
$
matches the empty string immediately before a newline as well as the empty string at the end of the string (but see theREG_NOTEOL
regexec()
flag below).
- The match-any-character operator (dot
- REG_LITERAL
- Interpret the entire
regex
argument as a literal string, that is, all characters will be considered ordinary. This is a nonstandard extension, compatible with but not specified by POSIX. - REG_NOSPEC
- Same as
REG_LITERAL
. This flag is provided for compatibility with BSD. - REG_RIGHT_ASSOC
- By default, concatenation is left associative in TRE, as per the grammar given in the base specifications on regular expressions of Std 1003.1-2001 (POSIX). This flag flips associativity of concatenation to right associative. Associativity can have an effect on how a match is divided into submatches, but does not change what is matched by the entire regexp.
- REG_UNGREEDY
- By default, repetition operators are greedy in TRE as per Std 1003.1-2001 (POSIX) and can be forced to be non-greedy by appending a ? character. This flag reverses this behavior by making the operators non-greedy by default and greedy when a ? is specified.