[TRE-general] libtre capture nonsense

Ville Laurikari ville at laurikari.net
Sun Feb 18 20:46:31 EET 2007


On Sun, Jan 28, 2007 at 11:59:06AM +0000, Chris Kuklewicz wrote:
> Hi Ville,
>
>   Thanks to your Master thesis I seem to have a tagged DFA that passes all the
> POSIX tests.  Concentrating on correctness I have not made it very efficient
> yet, so it is 4 times slower that TRE for the one speed benchmark I have.

I'm interested, how did you make it handle cases such as this
correctly:

regex: (a|ab|aba|baab)*

The string "abaab" can be matched in different ways:
  1.  match "aba", then "ab"
or
  2.  match "ab", then "a", then "ab"
or
  3.  match "a", then "baab"

Of these, the correct match according to POSIX is number 1.  This is
because repetitions should be treated so that each iteration matches
as many characters as possible and earlier repetitions take precedence
over later repetitions.

When your tagged DFA has consumed the last character, how does it
determine that the submatch to return is "ab", and not "baab"?

--
http://www.iki.fi/vl/


More information about the TRE-general mailing list