[TRE-general] libtre capture nonsense
Shmuel Zeigerman
shmuz at actcom.co.il
Tue Feb 20 20:28:53 EET 2007
Hello,
> regex: (a|ab|aba|baab)*
>
> The string "abaab" can be matched in different ways:
> 1. match "aba", then "ab"
> or
> 2. match "ab", then "a", then "ab"
> or
> 3. match "a", then "baab"
>
> Of these, the correct match according to POSIX is number 1. This is
> because repetitions should be treated so that each iteration matches
> as many characters as possible and earlier repetitions take precedence
> over later repetitions.
>
> When your tagged DFA has consumed the last character, how does it
> determine that the submatch to return is "ab", and not "baab"?
Probably I'm doing something wrong, but my test with TRE 0.7.5
returns the submatch "baab".
--
Shmuel
More information about the TRE-general
mailing list