[TRE-general] Feature-wish: named subexpressions and/or return "token-id" for top-level-union

Setzer, Sebastian (ext) sebastian.setzer.ext at siemens.com
Thu Apr 26 21:28:41 EEST 2007


Hi,
I'd like to use tre similar to lex (or flex...): Given a set of regular
expressions and a text, tell me which of the expressions matches first.

I can do this like this:
- build a big expression out of the given set of expressions: put
brackets around each of them and join them with "|".
- use regex.re_nsub to determine how many subexpresions there are in
every one of them in order to get the index at which they start in the
big union-expression.
- look at every of these indexes, test if the subexpression matched.

Test-code which does this is appended (I hope the mailing list doesn't
filter this out).
Please note that it contains a memory leak and probably other bugs...

If the tre-API had a function which does this, you could probably do it
better because
- you need only tags for the top-level-subexpression (the ones for which
I collected the indexes) if the user isn't interested in the
sub-subexpessions (...REG_NOSUB)
- if you precompute a DFA, you can return the "token-id" in
O(text-length) without needing to iterate through the expression-set

A related feature are named subexpressions (for example with the syntax
of python: "(?P<name>...)").
If you have them, you don't need to worry about subexpression-indexes.

Sebastian Setzer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: t.c
Type: application/octet-stream
Size: 3881 bytes
Desc: t.c
Url : http://laurikari.net/pipermail/tre-general/attachments/20070426/d01b2d6e/attachment.obj 


More information about the TRE-general mailing list