TRE 0.8.0 Released

by Ville Laurikari on September 20, 2009

Download here. New in this release:

  • Added tre_ prefix to all functions exported from libtre. This changes the binary interface (ABI). The old source interface (API) is still available in <tre/regex.h>.  New code should use <tre/tre.h> which exports functions that have the prefix.
  • Visual C++ 6 project files replaced with Visual Studio 2008 files.
  • Bug fixes.

{ 13 comments… read them below or add one }

1 Bang Jun-young October 2, 2009 at 14:03

Thanks for the work but what are actually included in the tarballs are Visual C++ 6 files.

2 Ville Laurikari October 2, 2009 at 14:10

Oh, drats. I forgot to update something and the old files still got into the packages. Thanks for letting me know!

I’ve now fixed this for the next release. Until then, you can download the project (.vcproj) and solution (.sln) files for Visual Studio 2008 from the darcs repository here.

3 GregK October 5, 2009 at 19:18

TRE is mentioned on Wikipedia, but the reference to LGPL is outdated now.
http://en.wikipedia.org/wiki/Agrep

4 Ville Laurikari October 7, 2009 at 21:19

Indeed. I went and updated the agrep page regarding TRE license.

5 Bang Jun-young November 10, 2009 at 03:34

Checked out the sources with command ‘darcs get –set-scripts-executable http://laurikari.net/tre/darcs/stable/‘ as described on the download page, but I still can’t find Visual Studio 2008 files there.

Downloading the files from the darcs web interface by clicking on the ‘plain’ link doesn’t work either. What I actually got were HTML-decorated text (< garbled with &lt\;) rather than plain text.

Downloading from the browser's view-source page doesn't work either. This time the server refuses to send data to the browser. :-(

6 Ville Laurikari December 2, 2009 at 12:45

Sorry about that, and sorry for the delay (your comment got caught in the spam filter). Now the darcs repo is properly updated.

I have no problems downloading files from the web interface.

7 Steve Teale January 29, 2010 at 12:57

I am trying to translate the non-fuzzy part of TRE into D. I noticed this:

tre_ctype_t tre_ctype(const char *name)
{
int i;
for (i = 0; tre_ctype_map[i].name != NULL; i++)
{
if (strcmp(name, tre_ctype_map[i].name) == 0)
return tre_ctype_map[i].func;
}
return (tre_ctype_t)0;
}

It is prototyped as returning a character type, but if name is found, it returns a pointer to a function. The parsing code behaves as if it returned a character (I think). Could you possibly explain?

Thanks Steve

8 Ville Laurikari March 12, 2010 at 14:46

Steve, you’ve run into a confusing hack of mine. Please accept my apologies.

In the normal case, tre_ctype() is just the same as the wctype() function from the C library. It returns a “character class” object. The counterpart of tre_ctype() is tre_isctype(), which takes a character and a character class object, and returns non-zero if the character is part of the character class. Normally tre_isctype() is the same as iswctype().

If the system does not have wctype() and iswctype(), TRE uses it’s own implementation. In this case, the character class object returned by tre_ctype() is actually a function which gets called by tre_isctype().

You can find the macros that control this in tre-internal.sh. Search for SYSTEM_WCTYPE.

9 Enno April 12, 2010 at 11:33

Hi Ville,
I tried to run the retest application in MS Visual Studio in debug mode … but it would not run because there was an error in the tre.vsproj configuration file that I loaded from darcs:
The additional link library (for project “tre”: properties/configuration properties/Linker/Input/Additional Dependencies) for debug configuration should be “mscvprtd.lib”.
After I added the “d” character to to library everything was ok!
Regrads, Enno.

10 Blaisorblade May 1, 2010 at 20:43

Hi, I’ve found a reference to your library here:
http://patrakov.blogspot.com/2009/09/matching-multiple-strings.html
in that article, it is compared (for a testcase of “(word1|word2|….|wordN)”) to a trie-based library and GNU glibc implementation.

While here I see a vastly opposite result:
http://hackerboss.com/is-your-regex-matcher-up-to-snuff/

Now, is this slowdown fixable in your library, at least for non-approximate matching? Do you think there’s a special case for such patterns (using a tree for them) or is the slowdown just a product of using NFA’s versus recursive backtracking*?

*I just read the article you quote on your blog:
Regular Expression Matching Can Be Simple And Fast by Russ Cox.

11 Julien July 7, 2010 at 20:09

First, thank you for this library, it is really useful.
I could not reach the mailing list through your website, it seems the link is dead. http://laurikari.net/mailman/listinfo/tre-general)

I have stumbled upon an unexpected behavior:

pattern: ‘1111$’
str: ‘1111′ cost 0
str: ‘1111 ‘ cost 1 (as expected)

pattern: ‘2004$’
str: ‘2004′ cost 0
str: ‘2004 ‘ cost 2 (strange?)

I used a fresh download and default normal compilation
http://laurikari.net/tre/tre-0.8.0.tar.bz2
./configure
and then python setup.py install
My platform is Ubuntu 10.04

Here is the python code:

import tre
fz = tre.Fuzzyness(maxerr=3)

for pattern in ['1111$', '2004$']:
pt = tre.compile(pattern, tre.EXTENDED)
print "pattern:", repr(pattern)

for test_str in ['1111','1111 ', '2004','2004 ']:
m = pt.search(test_str, fz)
if m:
print "str:", repr(test_str), "cost", m.cost

Is this a known issue and something that would be fixed?
Thanks again.
Julien

12 JM July 21, 2010 at 21:20

Looks like agrep currently fails on non-latin encodings. If the input file has a non-latin character then agrep just stops processing with no errors.

13 sang-suan gam May 3, 2011 at 10:30

Hi,

just downloaded the library for use on the command
(windows).

the logic queries (AND, OR) are not working ?

# agrep ‘FATAL’ report.20110408.txt
776 FATAL ERROR IN TWO-TASK SERVER: error = 12571
20 FATAL ERROR IN TWO-TASK SERVER: error = 12571
20 FATAL ERROR IN TWO-TASK SERVER: error = 12571
20 FATAL ERROR IN TWO-TASK SERVER: error = 12571
20 FATAL ERROR IN TWO-TASK SERVER: error = 12571
20 FATAL ERROR IN TWO-TASK SERVER: error = 12571
20 FATAL ERROR IN TWO-TASK SERVER: error = 12571
20 FATAL ERROR IN TWO-TASK SERVER: error = 12571
20 FATAL ERROR IN TWO-TASK SERVER: error = 12571
#
# agrep ‘FATAL;ERROR’ report.20110408.txt
# agrep ‘FATAL,ERROR’ report.20110408.txt

also, when i ran the command in a cygwin terminal,
there is a complain that delimiters should not be
empty strings ?

# agrep -d ‘$$’ ‘FATAL,ERROR’ report.20110408.txt
C:\GnuWin32\bin\agrep.exe: Record delimiter pattern must not match an empty string
#

are these features no longer supported ?

Thanks,
sam

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

Previous post:

Next post: