Regular expressions can become quite complex. The lack of white space makes them difficult to read. I can't step though a regular expression with a debugger. So how do experts debug complex regular expressions?
Have a look at the (non-free) tools on regular-expressions.info. RegexBuddy in particular. Here is Jeff Atwood's post on the subject.
You buy RegexBuddy and use its built in debug feature. If you work with regexes more than twice a year, you will make this money back in time saved in no time. RegexBuddy will also help you to create simple and complex regular expressions, and even generate the code for you in a variety of languages.
Also, according to the developer, this tool runs nearly flawlessly on Linux when used with WINE.
I think they don't. If your regexp is too complicated, and problematic to the point you need a debugger, you should create a specific parser, or use another method.It will be much more readable and maintenable.
With Perl 5.10, use re 'debug';
. (Or debugcolor
, but I can't format the output properly on Stack Overflow.)
$ perl -Mre=debug -e'"foobar"=~/(.)\1/' Compiling REx "(.)\1" Final program: 1: OPEN1 (3) 3: REG_ANY (4) 4: CLOSE1 (6) 6: REF1 (8) 8: END (0) minlen 1 Matching REx "(.)\1" against "foobar" 0 <> <foobar> | 1:OPEN1(3) 0 <> <foobar> | 3:REG_ANY(4) 1 <f> <oobar> | 4:CLOSE1(6) 1 <f> <oobar> | 6:REF1(8) failed... 1 <f> <oobar> | 1:OPEN1(3) 1 <f> <oobar> | 3:REG_ANY(4) 2 <fo> <obar> | 4:CLOSE1(6) 2 <fo> <obar> | 6:REF1(8) 3 <foo> <bar> | 8:END(0) Match successful! Freeing REx: "(.)\1"
Also, you can add whitespace and comments to regexes to make them more readable. In Perl, this is done with the /x
modifier. With pcre
, there is the PCRE_EXTENDED
flag.
"foobar" =~ /
(.) # any character, followed by a
\1 # repeat of previously matched character
/x;
pcre *pat = pcre_compile("(.) # any character, followed by a\n"
"\\1 # repeat of previously matched character\n",
PCRE_EXTENDED,
...);
pcre_exec(pat, NULL, "foobar", ...);
I use these online tools to debug my regex:
http://www.solmetra.com/scripts/regex/
But yeah, none of those can beat RegexBuddy.
I debug my regexes with my own eyes. That's why I use /x
modifier, write comments for them and split them in parts. Read Jeffrey Friedl's Mastering Regular Expressions to learn how to develop fast and readable regular expressions. Various regex debugging tools just provoke voodoo programming.
There is an excellent free tool, the Regex Coach. The latest version is only available for Windows; its author Dr. Edmund Weitz stopped maintaining the Linux version because too few people downloaded it, but there is an older version for Linux on the download page.
Writing reg exes using a notation like PCREs is like writing assembler: it's fine if you can just see the corresponding finite state automata in your head, but it can get difficult to maintain very quickly.
The reasons for not using a debugger are much the same as for not using a debugger with a programming language: you can fix local mistakes, but they won't help you solve the design problems that led you to make the local mistakes in the first place.
The more reflective way is to use data representations to generate regexps in your programming language, and have appropriate abstractions to build them. Olin Shiver's introduction to his scheme regexp notation gives an excellent overview of the issues faced in designing these data representations.
I use Kodos - The Python Regular Expression Debugger:
Kodos is a Python GUI utility for creating, testing and debugging regular expressions for the Python programming language. Kodos should aid any developer to efficiently and effortlessly develop regular expressions in Python. Since Python's implementation of regular expressions is based on the PCRE standard, Kodos should benefit developers in other programming languages that also adhere to the PCRE standard (Perl, PHP, etc...).
(...)
Runs on Linux, Unix, Windows, Mac.
When I get stuck on a regex I usually turn to this: http://gskinner.com/RegExr/
Its perfect for quickly testing where something is going wrong.
I often use pcretest - hardly a "debugger" but it works over a text-only SSH connection and parses exactly the regex dialect I need: my (C++) code links to libpcre, so there's no difficulty with subtle differences in what's magic and what isn't, etc.
In general I agree with the guy above to whom needing a regex debugger is a code smell. For me the hardest about using regexes is usually not the regex itself, but the multiple layers of quoting needed to make them work.
I often use Ruby based regexp tester Rubular
and also in Emacs use M-x re-builder
Firefox also has a useful extension