[TRE-general] libtre capture nonsense

Chris Kuklewicz tre-general at list.mightyreason.com
Mon Jan 15 22:45:33 EET 2007


Hello again,

  I am still redesigning my Haskell implementation of a TNFA/TDFA and comparing
to libtre-0.7.4.

I have found what looks like a bug and which might be distinct from the previous
bug which involved (^|()) captures.

Specifically, this behavior from libtre seems like nonsense:

> let r = makeRegex "(b*|c(c*))*" :: Text.Regex.TRE.Regex
  in match r "cbb" :: MatchArray

array (0,2) [(0,(0,3)),(1,(1,2)),(2,(1,2))]

The above is a list of (capture index,(match offset,match length)) tuples.

Thus the above means that \0 captured "cbb" and \1 captured "\bb" and \2
captured "bb".

But \2 is (c*) so this is nonsense.  Not that my implementation works, either.

I think the correct answer is
array (0,2) [(0,(0,3)),(1,(1,2)),(2,(-1,0))]
where there is no capture \2 for (c*).

Does anyone here thing the above is incorrect?

Currently my flawed code returns
array (0,2) [(0,(0,3)),(1,(1,2)),(2,(1,0))]
where there is an empty capture for (c*).  I need to change my design to keep
track of and utilize a bit more information to recognize which tags to reset.

-- 
Chris


More information about the TRE-general mailing list