regaexec() – Approximate Matching

#include <tre/tre.h>
 
typedef struct {
  int cost_ins;
  int cost_del;
  int cost_subst;
  int max_cost;
 
  int max_ins;
  int max_del;
  int max_subst;
  int max_err;
} regaparams_t;
 
typedef struct {
  size_t nmatch;
  regmatch_t *pmatch;
  int cost;
  int num_ins;
  int num_del;
  int num_subst;
} regamatch_t;
 
int tre_regaexec(const regex_t *preg, const char *string,
                 regamatch_t *match, regaparams_t params, int eflags);
int tre_reganexec(const regex_t *preg, const char *string, size_t len,
                  regamatch_t *match, regaparams_t params, int eflags);
int tre_regawexec(const regex_t *preg, const wchar_t *string,
                  regamatch_t *match, regaparams_t params, int eflags);
int tre_regawnexec( const regex_t *preg, const wchar_t *string, size_t len,
                   regamatch_t *match, regaparams_t params, int eflags);

The tre_regaexec() function searches for the best match in string against the compiled regexp preg, initialized by a previous call to any one of the regcomp functions.

The tre_reganexec() function is like tre_regaexec(), but string is not terminated by a null byte. Instead, the len argument is used to tell the length of the string, and the string may contain null bytes. The tre_regawexec() and tre_regawnexec() functions work like tre_regaexec() and tre_reganexec(), respectively, but take a wide character (wchar_t) string instead of a byte string.

The eflags argument is like for the tre_regexec() functions.

The params struct controls the approximate matching parameters:

int cost_ins
The default cost of an inserted character, that is, an extra character in string.
int cost_del
The default cost of a deleted character, that is, a character missing from string.
int cost_subst
The default cost of a substituted character.
int max_cost
The maximum allowed cost of a match. If this is set to zero, an exact matching is searched for, and results equivalent to those returned by the regexec() functions are returned.
int max_ins
Maximum allowed number of inserted characters.
int max_del
Maximum allowed number of deleted characters.
int max_subst
Maximum allowed number of substituted characters.
int max_err
Maximum allowed number of errors (inserts + deletes + substitutes).

The match argument points to a regamatch_t structure. The nmatch and pmatch field must be filled by the caller. If REG_NOSUB was used when compiling the regexp, or match->nmatch is zero, or match->pmatch is NULL, the match->pmatch argument is ignored. Otherwise, the submatches corresponding to the parenthesized subexpressions are filled in the elements of match->pmatch, which must be dimensioned to have at least match->nmatch elements. The match->cost field is set to the cost of the match found, and the match->num_ins, match->num_del, and match->num_subst fields are set to the number of inserts, deletes, and substitutes in the match, respectively.

The tre_regaexec() functions return zero if a match with cost smaller than params->max_cost was found, otherwise they return REG_NOMATCH to indicate no match, or REG_ESPACE to indicate that enough temporary memory could not be allocated to complete the matching operation.