[TRE-general] libtre capture nonsense
Chris Kuklewicz
tre-general at list.mightyreason.com
Mon Jan 15 22:45:33 EET 2007
Hello again,
I am still redesigning my Haskell implementation of a TNFA/TDFA and comparing
to libtre-0.7.4.
I have found what looks like a bug and which might be distinct from the previous
bug which involved (^|()) captures.
Specifically, this behavior from libtre seems like nonsense:
> let r = makeRegex "(b*|c(c*))*" :: Text.Regex.TRE.Regex
in match r "cbb" :: MatchArray
array (0,2) [(0,(0,3)),(1,(1,2)),(2,(1,2))]
The above is a list of (capture index,(match offset,match length)) tuples.
Thus the above means that \0 captured "cbb" and \1 captured "\bb" and \2
captured "bb".
But \2 is (c*) so this is nonsense. Not that my implementation works, either.
I think the correct answer is
array (0,2) [(0,(0,3)),(1,(1,2)),(2,(-1,0))]
where there is no capture \2 for (c*).
Does anyone here thing the above is incorrect?
Currently my flawed code returns
array (0,2) [(0,(0,3)),(1,(1,2)),(2,(1,0))]
where there is an empty capture for (c*). I need to change my design to keep
track of and utilize a bit more information to recognize which tags to reset.
--
Chris
More information about the TRE-general
mailing list