I'm new to regular expressions, I've been able to write a few through trial and error so tried a few programs to help me write the expression but the programs were harder to understand than the regular expressions themselves. Any recommended programs? I do most of my programming under Linux.
A great program for helping you write regular expressions would be Perl; you can try out a regex to see if it matches very easily:
perl -e 'print "yes!\n" if "string" =~ /regex to test/'
See this SO question on unit testing regexes for more information on testing regular expressions in general.
Unfortunately, if you're running linux, you won't have access to one of the best ones out there: Regex Buddy.
RegexBuddy is your perfect companion for working with regular expressions. Easily create regular expressions that match exactly what you want. Clearly understand complex regexes written by others. Quickly test any regex on sample strings and files, preventing mistakes on actual data. Debug without guesswork by stepping through the actual matching process. Use the regex with source code snippets automatically adjusted to the particulars of your programming language. Collect and document libraries of regular expressions for future reuse. GREP (search-and-replace) through files and folders. Integrate RegexBuddy with your favorite searching and editing tools for instant access. (from their website)
You could try using websites that give you hints and instant gratification like this one. Putting together a simple perl script that you can easily modify is also a great testing ground. Something like the following:
#!/usr/bin/perl
$mystring = "My cat likes to eat tomatoes.";
$mystring =~ s/cat/dog/g;
print $mystring;
If you're up for buying a tool, Komodo, by ActiveState is a great editor for scripting languages, and comes with a mighty fine regex helper. It's cross platform, but not free. It's helped me out of a few tight situations when I didn't quite understand why things weren't parsing and has support for several types of regexen varieties.
RegexPal is a great, free JavaScript regex tester. Because it uses the JavaScript regex engine, it doesn't have some of the more advanced regex features, but it works pretty well for a lot of regular expressions. The feature I miss most is lookbehind assertions.
Also check out the re
pragma, which will show how regexes are compiled as well as how they execute:
$ perl -Mre=debugcolor -e '"huzza" =~ /^(hu)?z{1,2}za$/'
Output is:
Compiling REx "^(hu)?z{1,2}za$" Final program: 1: BOL (2) 2: CURLYM[1] {0,1} (12) 6: EXACT (10) 10: SUCCEED (0) 11: NOTHING (12) 12: CURLY {1,2} (16) 14: EXACT (0) 16: EXACT (18) 18: EOL (19) 19: END (0) floating "zza"$ at 0..3 (checking floating) anchored(BOL) minlen 3 Guessing start of match in sv for REx "^(hu)?z{1,2}za$" against "huzza" Found floating substr "zza"$ at offset 2... Guessed: match at offset 0 Matching REx "^(hu)?z{1,2}za$" against "huzza" 0 | 1:BOL(2) 0 | 2:CURLYM[1] {0,1}(12) 0 | 6: EXACT (10) 2 | 10: SUCCEED(0) subpattern success... CURLYM now matched 1 times, len=2... CURLYM trying tail with matches=1... 2 | 12: CURLY {1,2}(16) EXACT can match 2 times out of 2... 3 | 16: EXACT (18) 5 | 18: EOL(19) 5 | 19: END(0) Match successful! Freeing REx: "^(hu)?z{1,2}za$"
Try YAPE::Regex::Explain for Perl:
#!/usr/bin/perl
use strict;
use warnings;
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(
qr/^\A\w{2,5}0{2}\S \n?\z/i
)->explain;
Output:
The regular expression: (?i-msx:^\A\w{2,5}0{2}\S \n?\z) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?i-msx: group, but do not capture (case-insensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- \A the beginning of the string ---------------------------------------------------------------------- \w{2,5} word characters (a-z, A-Z, 0-9, _) (between 2 and 5 times (matching the most amount possible)) ---------------------------------------------------------------------- 0{2} '0' (2 times) ---------------------------------------------------------------------- \S non-whitespace (all but \n, \r, \t, \f, and " ") ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- \n? '\n' (newline) (optional (matching the most amount possible)) ---------------------------------------------------------------------- \z the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Most regex bugs fall into three categories:
Subtle Omissions - leaving out '
^
' at the start or '$
' at the end, using '*
' where you should have used '+
' - these are just beginner mistakes, but its common for the buggy regex to still pass all of the automated tests.Accidental success - where part of the regex is just completely wrong and is destined to fail in 99% of real world use, but by sheer dumb luck it manages to pass the half-dozen automated tests you wrote.
Too much success - where one part of the regex matches a whole lot more than you thought. For example, the token
[^., ]*
will also match\r
and\n
, meaning that your regex can now match multiple lines of text even though you wrapped it in^
and$
.
There really is no substitute for properly learning regex. Read the reference manual on your regex engine, and use a tool like Regex Buddy to experiment and familiarize yourself with all of the features and especially take note of any special or unusual behaviours they can exhibit. If you learn regex properly, you will avoid most of the bugs mentioned above, and you will know how to write just a small number of automated tests which can guarantee all of the edge cases without over-testing obvious things (does [A-Z]
really match every letter between A and A? I'd better write 26 variations of the unit test to make sure!).
If you don't learn regex completely, you will need to write a ridiculous amount of automated tests to prove that your magical regex is correct.
http://regex-test.com is a really good/professional website which allows you to test many different types of regular expressions.