[TRE-general] Matching (^) question

Chris Kuklewicz tre-general at list.mightyreason.com
Sat Jan 6 14:17:33 EET 2007


Ville Laurikari wrote:
> On Fri, Jan 05, 2007 at 09:56:39PM +0000, Chris Kuklewicz wrote:
>> Hello,
>>
>> I am tinkering more with the haskell regex-* packages.  In particular I have
>> been considering implementing a tagged dfa in pure Haskell.
> 
>>>> "searchme" =~ "s(()|^)e" :: Array Int (MatchOffset,MatchLength)
>>> array (0,2) [(0,(0,2)),(1,(1,0)),(2,(1,0))]
>> The above looks sane but re-ordering the alternative causes the match to fail:
>>
>>>> "searchme" =~ "s(^|())e" :: Array Int (MatchOffset,MatchLength)
>>> array (1,0) []
> 
> Looks like you've found a bug in TRE.  Alternatives where both sides
> match an empty string and have different assertions (^ or $) don't
> work correctly.

That was the most obviously bug-like of the examples I put in my message. It is
a somewhat strange regular expressions to want to match.

> I went through some of the TRE code and found the bug in the regexp
> compiler. 

That was fast.

> I know why it doesn't work, but it seems that the fix is
> not trivial. 

We can always hope you will have a sudden insight the reveals a more trivial fix.

> Lately I've had limited time (or have chosen to limit my
> time) to work on TRE, so I don't know when I'll be able to develop a
> fix.  It would probably involve rewriting parts of the regexp
> compiler...

I understand that kind of non-trivial bug, partly from slowly building a Haskell
regex compiler that will someday use tagged transitions.

I am very curious: Has anyone else written a tagged NFA/DFA regex engine?

My progress:

(Since I am doing this as a hobby, I have not looked at any of the libtre source
code, in order to get all the fun and pain)

I have written an NFA/DFA without tags or ^$ anchors, but then adding ^$ anchors
was a serious rewrite and expansion.  Now that this is working I am trying to
add tags to the program.  I have managed to generate and add tags to a
transformed regex parse tree.  And I have probably managed to add tags to the
design of the NFA data structure.  Converting my parse tree into the NFA with
all the tags correct has not been finished.  Then I will have to add tags to the
DFA and update the NFA to DFA conversion routine.  But all that still won't give
me subexpression capture since I still have to write code to turn the tag data
to compute the final captured groups.

Currently, the NFAs I generate have no empty transitions but my algortithm
creates different NFAs than the comparable algorithm you used in your thesis.
Which has meant that my first design for the tagged transitions is more
complicated: it has the possibility of tags in the transition both "before" and
"after" consuming the character.  I have a small hope that after I get it
working I will see how to simplify the design to just before or after tags.

Cheers,
  Chris


More information about the TRE-general mailing list