views:

1134

answers:

17

Regular expressions can become quite complex. The lack of white space makes them difficult to read. I can't step though a regular expression with a debugger. So how do experts debug complex regular expressions?

+1  A: 

Have a look at the (non-free) tools on regular-expressions.info. RegexBuddy in particular. Here is Jeff Atwood's post on the subject.

Skilldrick
+28  A: 

You buy RegexBuddy and use its built in debug feature. If you work with regexes more than twice a year, you will make this money back in time saved in no time. RegexBuddy will also help you to create simple and complex regular expressions, and even generate the code for you in a variety of languages.

alt text

Also, according to the developer, this tool runs nearly flawlessly on Linux when used with WINE.

Mick
That is very cool.
Rook
Too bad it's Windows-only and costs US$40.
KennyTM
It runs on Linux via WINE, according to the developer: http://www.regexbuddy.com/wine.html. And about the $40 cost...how much is your time worth?
Mick
@KeynnyTM It works under wine, and I get paid to write code so this saves money.
Rook
This is not free software.
codeholic
Who said it was or asked for it?
Tim Pietzcker
@Tim: I don't know about others, but it definitely prevents *me* from using it. One of my primary development platforms is a PowerPC Linux machine.
ephemient
Well, like Mick said, how much is your time worth? "The best tools money can buy" don't always have to cost money, but sometimes they do. Plus, JGSoft consistently develops *great* quality products with exceptional user service. I have even bought software from them I don't really need (like RegexMagic) because I'd like to support them and keep them in business. You don't know what you're missing. Seriously.
Tim Pietzcker
+6  A: 

I think they don't. If your regexp is too complicated, and problematic to the point you need a debugger, you should create a specific parser, or use another method.It will be much more readable and maintenable.

Valentin Rocher
Dude, you posted this after looking at the regexbuddy screen shot?
Rook
Everyone will disagree with this, but it's not a bad idea. Everyone assumes that the regex engine is most efficient with enormous regexes. This is not necessarily true, and they're definitely not easy to read. Break your regexes up.
Yar
@Michael Brooks: No, before, actually. Having seen the screenshot, I'm okay with the fact you CAN debug a regexp. But I stand on my idea : when a regexp becomes too complicated, it's time to change to another way.
Valentin Rocher
+15  A: 

With Perl 5.10, use re 'debug';. (Or debugcolor, but I can't format the output properly on Stack Overflow.)

$ perl -Mre=debug -e'"foobar"=~/(.)\1/'
Compiling REx "(.)\1"
Final program:
   1: OPEN1 (3)
   3:   REG_ANY (4)
   4: CLOSE1 (6)
   6: REF1 (8)
   8: END (0)
minlen 1
Matching REx "(.)\1" against "foobar"
   0 <> <foobar>             |  1:OPEN1(3)
   0 <> <foobar>             |  3:REG_ANY(4)
   1 <f> <oobar>             |  4:CLOSE1(6)
   1 <f> <oobar>             |  6:REF1(8)
                                  failed...
   1 <f> <oobar>             |  1:OPEN1(3)
   1 <f> <oobar>             |  3:REG_ANY(4)
   2 <fo> <obar>             |  4:CLOSE1(6)
   2 <fo> <obar>             |  6:REF1(8)
   3 <foo> <bar>             |  8:END(0)
Match successful!
Freeing REx: "(.)\1"

Also, you can add whitespace and comments to regexes to make them more readable. In Perl, this is done with the /x modifier. With pcre, there is the PCRE_EXTENDED flag.

"foobar" =~ /
    (.)  # any character, followed by a
    \1   # repeat of previously matched character
/x;

pcre *pat = pcre_compile("(.)  # any character, followed by a\n"
                         "\\1  # repeat of previously matched character\n",
                         PCRE_EXTENDED,
                         ...);
pcre_exec(pat, NULL, "foobar", ...);
ephemient
+1: Why would anyone prefer Regexbuddy to this?
Charles Stewart
@Charles, because it's nice, and cute, and you can use mouse, and in colors, and bells and whistles. And you have to pay for it, which makes you think you're using Serious Software!
Pavel Shved
I like this method, even though i think regex buddy is better.
Rook
+5  A: 

I use these online tools to debug my regex:

http://www.regextester.com/

http://www.solmetra.com/scripts/regex/

But yeah, none of those can beat RegexBuddy.

Wilhelm
A: 

You could try this one http://www.pagecolumn.com/tool/regtest.htm

Jenea
+3  A: 

I debug my regexes with my own eyes. That's why I use /x modifier, write comments for them and split them in parts. Read Jeffrey Friedl's Mastering Regular Expressions to learn how to develop fast and readable regular expressions. Various regex debugging tools just provoke voodoo programming.

codeholic
+3  A: 

There is an excellent free tool, the Regex Coach. The latest version is only available for Windows; its author Dr. Edmund Weitz stopped maintaining the Linux version because too few people downloaded it, but there is an older version for Linux on the download page.

APC
A: 

Writing reg exes using a notation like PCREs is like writing assembler: it's fine if you can just see the corresponding finite state automata in your head, but it can get difficult to maintain very quickly.

The reasons for not using a debugger are much the same as for not using a debugger with a programming language: you can fix local mistakes, but they won't help you solve the design problems that led you to make the local mistakes in the first place.

The more reflective way is to use data representations to generate regexps in your programming language, and have appropriate abstractions to build them. Olin Shiver's introduction to his scheme regexp notation gives an excellent overview of the issues faced in designing these data representations.

Charles Stewart
Parser combinators are indeed an awesome way to go: Parsec and PArrows in Haskell, rsec in Ruby, Boost Spirit in C++, PyParsing in Python, Perl6::Rules in Perl, etc.
ephemient
+6  A: 

http://www.regexpal.com

I use it all the time. It even has a nice reference.

hal10001
+1 cool webapp.
Rook
+1  A: 

I use strfriend.

Ken
+2  A: 

I use:

http://regexlib.com/RETester.aspx

You can also try Regex Hero (uses Silverlight):

http://regexhero.net/tester/

Leniel Macaferi
+7  A: 

I use Kodos - The Python Regular Expression Debugger:

Kodos is a Python GUI utility for creating, testing and debugging regular expressions for the Python programming language. Kodos should aid any developer to efficiently and effortlessly develop regular expressions in Python. Since Python's implementation of regular expressions is based on the PCRE standard, Kodos should benefit developers in other programming languages that also adhere to the PCRE standard (Perl, PHP, etc...).

(...)

alt text

Runs on Linux, Unix, Windows, Mac.

Pascal Thivent
+4  A: 

When I get stuck on a regex I usually turn to this: http://gskinner.com/RegExr/

Its perfect for quickly testing where something is going wrong.

thetaiko
+2  A: 

If I'm feeling stuck, I like to go backward and generate the regex directly from a sample text using txt2re (although I usually end up tweaking the resulting regex by hand).

eggsyntax
A: 

I often use pcretest - hardly a "debugger" but it works over a text-only SSH connection and parses exactly the regex dialect I need: my (C++) code links to libpcre, so there's no difficulty with subtle differences in what's magic and what isn't, etc.

In general I agree with the guy above to whom needing a regex debugger is a code smell. For me the hardest about using regexes is usually not the regex itself, but the multiple layers of quoting needed to make them work.

Bernd Jendrissek
+1  A: 

I often use Ruby based regexp tester Rubular

and also in Emacs use M-x re-builder

Firefox also has a useful extension

slomojo