tags:

views:

213

answers:

10

I'm new to regular expressions, I've been able to write a few through trial and error so tried a few programs to help me write the expression but the programs were harder to understand than the regular expressions themselves. Any recommended programs? I do most of my programming under Linux.

+2  A: 

A great program for helping you write regular expressions would be Perl; you can try out a regex to see if it matches very easily:

perl -e 'print "yes!\n" if "string" =~ /regex to test/'

See this SO question on unit testing regexes for more information on testing regular expressions in general.

Robert P
Perl's `print()` won't append a newline - you should change your shell double-quotes to single-quotes so that you can double-quote your Perl strings, and say `print "Yes!\n"`
Chris Lutz
Hmm...worked for me; swapped.
Robert P
Random stab here at why: If you're using ActivePerl at the command prompt on Windows, I believe the command prompt will add a newline to the end. This is a good idea, but doesn't happen on *nix unfortunately.
Chris Lutz
Strawberry Perl, but yeah.
Robert P
+2  A: 

Unfortunately, if you're running linux, you won't have access to one of the best ones out there: Regex Buddy.

RegexBuddy is your perfect companion for working with regular expressions. Easily create regular expressions that match exactly what you want. Clearly understand complex regexes written by others. Quickly test any regex on sample strings and files, preventing mistakes on actual data. Debug without guesswork by stepping through the actual matching process. Use the regex with source code snippets automatically adjusted to the particulars of your programming language. Collect and document libraries of regular expressions for future reuse. GREP (search-and-replace) through files and folders. Integrate RegexBuddy with your favorite searching and editing tools for instant access. (from their website)

Robert P
RegexBuddy is, hands-down, the best regex debugger I've ever used. I can't even estimate how much time it's saved me over the years.
Ben Blank
I hate the regular expressions that RegexBuddy creates with a burning passion, but I have no experience with it's debugger. Any regex that needs a debugger needs to be rewritten to be either shorter or more than one regex.
Chris Lutz
Wow RegexBuddy looks awesome and they provide some tips on getting it running under Wine :)
Roberto Rosario
+1  A: 

You could try using websites that give you hints and instant gratification like this one. Putting together a simple perl script that you can easily modify is also a great testing ground. Something like the following:

#!/usr/bin/perl

$mystring = "My cat likes to eat tomatoes.";
$mystring =~ s/cat/dog/g;
print $mystring;
akf
A: 

If you're up for buying a tool, Komodo, by ActiveState is a great editor for scripting languages, and comes with a mighty fine regex helper. It's cross platform, but not free. It's helped me out of a few tight situations when I didn't quite understand why things weren't parsing and has support for several types of regexen varieties.

Robert P
+4  A: 

RegexPal is a great, free JavaScript regex tester. Because it uses the JavaScript regex engine, it doesn't have some of the more advanced regex features, but it works pretty well for a lot of regular expressions. The feature I miss most is lookbehind assertions.

Shawn
Seconded. While RegexBuddy is probably the best windows Regex debugger, RegexPal works on whatever platform your working on, it's convenient, and it's free!
Bill Casarin
A: 

Also check out the re pragma, which will show how regexes are compiled as well as how they execute:

$ perl -Mre=debugcolor -e '"huzza" =~ /^(hu)?z{1,2}za$/'

Output is:

    Compiling REx "^(hu)?z{1,2}za$"
    Final program:
       1: BOL (2)
       2: CURLYM[1] {0,1} (12)
       6:   EXACT  (10)
      10:   SUCCEED (0)
      11: NOTHING (12)
      12: CURLY {1,2} (16)
      14:   EXACT  (0)
      16: EXACT  (18)
      18: EOL (19)
      19: END (0)
    floating "zza"$ at 0..3 (checking floating) anchored(BOL) minlen 3 
    Guessing start of match in sv for REx "^(hu)?z{1,2}za$" against "huzza"
    Found floating substr "zza"$ at offset 2...
    Guessed: match at offset 0
    Matching REx "^(hu)?z{1,2}za$" against "huzza"
       0           |  1:BOL(2)
       0           |  2:CURLYM[1] {0,1}(12)
       0           |  6:  EXACT (10)
       2           | 10:  SUCCEED(0)
                                        subpattern success...
                                      CURLYM now matched 1 times, len=2...
                                      CURLYM trying tail with matches=1...
       2           | 12:  CURLY {1,2}(16)
                                        EXACT  can match 2 times out of 2...
       3           | 16:    EXACT (18)
       5           | 18:    EOL(19)
       5           | 19:    END(0)
    Match successful!
    Freeing REx: "^(hu)?z{1,2}za$"
Inshallah
+6  A: 

Try YAPE::Regex::Explain for Perl:

#!/usr/bin/perl

use strict;
use warnings;

use YAPE::Regex::Explain;

print YAPE::Regex::Explain->new(
    qr/^\A\w{2,5}0{2}\S \n?\z/i
)->explain;

Output:

The regular expression:

(?i-msx:^\A\w{2,5}0{2}\S \n?\z)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?i-msx:                 group, but do not capture (case-insensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  \A                       the beginning of the string
----------------------------------------------------------------------
  \w{2,5}                  word characters (a-z, A-Z, 0-9, _)
                           (between 2 and 5 times (matching the most
                           amount possible))
----------------------------------------------------------------------
  0{2}                     '0' (2 times)
----------------------------------------------------------------------
  \S                       non-whitespace (all but \n, \r, \t, \f,
                           and " ")
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  \n?                      '\n' (newline) (optional (matching the
                           most amount possible))
----------------------------------------------------------------------
  \z                       the end of the string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Sinan Ünür
That, sir, is amazing.
Chris Lutz
+2  A: 

Most regex bugs fall into three categories:

  • Subtle Omissions - leaving out '^' at the start or '$' at the end, using '*' where you should have used '+' - these are just beginner mistakes, but its common for the buggy regex to still pass all of the automated tests.

  • Accidental success - where part of the regex is just completely wrong and is destined to fail in 99% of real world use, but by sheer dumb luck it manages to pass the half-dozen automated tests you wrote.

  • Too much success - where one part of the regex matches a whole lot more than you thought. For example, the token [^., ]* will also match \r and \n, meaning that your regex can now match multiple lines of text even though you wrapped it in ^ and $.

There really is no substitute for properly learning regex. Read the reference manual on your regex engine, and use a tool like Regex Buddy to experiment and familiarize yourself with all of the features and especially take note of any special or unusual behaviours they can exhibit. If you learn regex properly, you will avoid most of the bugs mentioned above, and you will know how to write just a small number of automated tests which can guarantee all of the edge cases without over-testing obvious things (does [A-Z] really match every letter between A and A? I'd better write 26 variations of the unit test to make sure!).

If you don't learn regex completely, you will need to write a ridiculous amount of automated tests to prove that your magical regex is correct.

too much php
A: 

Kudos is a great free cross-platform regular expression debugger.

Wogan
A: 

http://regex-test.com is a really good/professional website which allows you to test many different types of regular expressions.

MrThys