[TRE-general] Repeat histories

Chris Kuklewicz tre-general at list.mightyreason.com
Thu Feb 22 01:16:05 EET 2007


I almost, but not quite, understand the last two messages...

> Specifically, I decompose (R)+ into R'* (R) where R' is R with all captures
> eliminated. (R)* is decomposed into (?:R'* (R))?  where the outer parentheses
> are not captures. If R is right-matchable, then this decomposition generates
> the correct capture with standard Posix concatenation semantics;

I disagree

> that capture is the one for which R'* is the longest, and consequently (R) is
>  the shortest.

Applying my understanding of "standard Posix concatenation":

(?:R'*(R))? is not Posix equivalent to (R)*, as posix will maximize the whole
match and (for non-empty) then R'*(R) then R'* then (R).

The longest match of R'* in R*'(R) against "ab" is still "a", and (R) is "b".

The Posix match of R* is one iteration and is "(ab)".

So I assume you are doing something tricky with the iteration count...

> So in the automaton for (a|b|ab)*, the transitions are (using ^ as the
> position marker because centre-dot is too hard to type):
> 
> State     input     Goto(s)
> ^a|b|ab     a       ^a|b|ab  a|^b|ab  a|b|^ab  +Follow set
> a|^b|ab     b       ^a|b|ab  a|^b|ab  a|b|^ab  +Follow set
> a|b|a^b     b       ^a|b|ab  a|^b|ab  a|b|^ab  +Follow set
> a|b|^ab     a       a|b|a^b 

I think I understand the notation.  The ^ indicate "about to accept the
character to the right of the ^".  I do no understand "+Follow set"...it seems
to mean that the * has iterated.  Or it could mean that having accepted that
input leads to a winning/final state.

> Consequently, the parse of "ab" as (a)(b) is not possible.

Unfortunately, I do not understand how to draw that conclusion from your
transition table.  But if you are minimizing the iteration count then this may
be correct.

Rici Lake wrote:
> 
> On 21-Feb-07, at 4:16 PM, Chris Kuklewicz wrote:
> 
>> I am unlikely to believe that: (right-matchable property) implies (the 
>> shortest final iteration implies Posix semantics).
> 
> Arghh, I got that completely wrong. I should have checked my code before I
> wrote all that.

I found writing down O(orbits) required looking at my code.

> 
> The transformation described is, in fact, the one I use; however, I then do
> first last match, not shortest last match.

"First last match" sounded like a contradiction until I decided that "first"
might mean the first iteration that leads to the whole match.  In that case

R'*(R) where 1 match is sufficient sets R'* to 0 iterations.  And (R)=>"ab"
which is correct.

In other cases...I am at a temporary loss.  I am not sure how "each match is, in
 turn, as greedy as possible" combines with "minimize the number of iterations
to get first last match".  Especially since Posix does not optimize on the
number of iterations.  So perhaps I have failed to guess the definition of
"first last match"

> 
> Blah. Well, it's always good to make an idiot of yourself in public once in a
> while.
> 

You have code.  The code does something correctly.  I am curious to understand
what it does and why it works.

Cheers,
  Chris


More information about the TRE-general mailing list