[TRE-general] libtre capture nonsense

Shmuel Zeigerman shmuz at actcom.co.il
Tue Feb 20 20:28:53 EET 2007


Hello,

> regex: (a|ab|aba|baab)*
> 
> The string "abaab" can be matched in different ways:
>   1.  match "aba", then "ab"
> or
>   2.  match "ab", then "a", then "ab"
> or
>   3.  match "a", then "baab"
> 
> Of these, the correct match according to POSIX is number 1.  This is
> because repetitions should be treated so that each iteration matches
> as many characters as possible and earlier repetitions take precedence
> over later repetitions.
> 
> When your tagged DFA has consumed the last character, how does it
> determine that the submatch to return is "ab", and not "baab"?

Probably I'm doing something wrong, but my test with TRE 0.7.5
returns the submatch "baab".

-- 
Shmuel




More information about the TRE-general mailing list