Copyright Dr John Maddock 1998-2002 all rights reserved.
DFA: deterministic finite state automaton.
NFA: non-deterministic finite state automaton.
The best known DFA regular expression library is by Henry Spencer, at
ftp.zoo.toronto.edu/pub/
Matching back references are almost
certainly NP-complete, and therefore almost certainly require exponential time
to match in the worst case.
More STL references and the STL port can be found
here.
|
|
Regex++
A portable template regular expression library.
Boost-version-1.30.0
Regex++ provides a set of portable template classes and functions for compiling,
matching and searching for regular expressions.
-
Supports the full regular expression syntax, including back references,
sub-expression matching, character classes, repeat expressions and word/line
boundaries. This version supports the full POSIX syntax including
multi-character collating elements and equivalence classes, as well as some of
Perl's features like non-greedy repeats, forward lookahead asserts, and
non-marking parenthesis.
-
Provides full Unicode support if your run-time library or operating system
supports it. Under Win32, uses native Windows NT Unicode support, and degrades
gracefully under Win9x to provide more limited wide character support.
-
Full run-time localisation support, either via your run time library (and
setlocale) or natively under Win32.
-
Highly customisable via a traits class, define your own traits class to change
almost any behaviour of the underlying regular expression syntax. For example
custom traits classes can define their own character classes, or handle case
conversion and character comparisons according to their own rules.
-
Provides a POSIX compatible C API for those who prefer to use it, in both
narrow and wide character versions.
-
Provides a high level class for simplified usage, for those who don't need the
power of the lower level template classes.
-
Provides a full API reference, and regular expression syntax reference.
-
Template based match/search algorithms can search any bi-directional iterator
type, allowing seamless searching of non-contiguous data, as well as more
traditional searches of const char* or const wchar_t* types.
-
Supports search and replace operations via a format string that transforms the
results of a match into a new string. This can be used to transform whole
strings in one go in the manner of Perl.
-
Supports perl-style splitting of matches.
-
Integrates with your existing STL implementation: The library will
automatically recognise the
Hewlett-Packard, Silicon
Graphics, Rogue Wave, and
Microsoft
STL versions, or you can use it independently of any STL if you prefer (this
applies to the 2.x version only - the 3.x version requires a reasonably
compliant C++ standard library).
-
Speed, this library uses branch prediction techniques to speed up algorithm
performance, for example the time takes to match the expression
"^([0-9]+)(\-| |$)(.*)$" against the string "100-
this is a line of ftp response which contains a message string"
are: BSD implementation 450 micro seconds, GNU implementation 271 micro
seconds, regex++ 127 micro seconds (Pentium P90, Win32 console app under MS
Windows 95).
-
Demo apps: The library comes with three demo applications, a version of grep, a
timer program to time the efficiency of any regular expressions you write, and
a regression test program to verify the correctness of the underlying
algorithms. The regression test program is by far most important of these; if
by far the least interesting.
-
Directly supported compilers: Borland C++ Builder 4 5 and 6, gcc 2.95 (Cygwin,
Linux and BSD), and Microsoft Visual C++ 6 and 7. It is also reported to build
with Sun Workshop 6.1, Kai C++, SGI Irix C++, Compaq true64, and HP aCC. If you
have an older or less capable compiler for example Visual C++ 5 (service pack
3), Sunpro 5 or HP aCC then you may want to try version 2.25 of the library
which requires fewer advanced compiler features.
You can download the latest version (based on boost-1.30.0) from here, and also
view the documentation online.
Important note for upgraders:
this version of the library is part of the peer-reviewed
boost library, as a result the library interface has been somewhat
redesigned and modernised. Check out the
notes for upgraders for compatibility with version 2.x.
regexpp3.zip(1249K)
regexpp3.tar.gz(687K)
Note that the boost-dependencies have grown sufficiently complicated that these
files have grown in size quite dramatically, I'll try and do better in future
releases.
There is also a C++ Builder wrapper component that is based on this library -
from Tropic
Software East.
Version 2.25 of the library is still available for those that have less capable
compilers:
regexpp2.zip(241)
regexpp2.tar.gz(189)
Regular expression libraries use a variety of differing algorithms all of which
have their pro's and con's, which can make it hard to choose the best
implementation for your purpose. This library uses an NFA algorithm which is
dedicated to determining fast and accurate sub-expression matches, as well as
providing support for features like back-references, which are hard to support
using DFA algorithms.
People who should use this library:
-
Anyone who needs to use wide character strings.
-
Anyone who needs to search non-contiguous data.
-
Anyone who wants fast sub-expression matching.
-
Anyone who wants to customise the regular expression behaviour, or localise the
library to a non-English locale.
People who should look to an alternative DFA based library:
-
Anyone who doesn't care about sub-expression matching, and
-
Wants to search only ANSI-C strings.
Back to home page....
Copyright Dr J. Maddock.
Last Updated: 17th May 2002.
|