tags:

views:

4054

answers:

16

I'm constantly amazed by the power of the regex. What I'm looking for here is:

  • Regexs that are more cleverly badass than ridiculously badass
  • Regex replacements are acceptable as well if you've had some cool usage of them
  • Refactored code to use a regex and make it more efficient
  • Refactored a large regex with a smaller one
  • Humorous regexs, especially if they have been used in production

I think that the most badass regex that I've ever used was that absoludicrous RFC822 email validation regex that I converted to C# and compiled for some form validation (it worked beautifully). It was an example of ridiculousness more than cleverness though.

(since this question is very subjective, after a week, I'll mark the highest rated answer as accepted, is that fair?)

+3  A: 

No such thing. "Brilliant Regex" is an oxymoron

Edit: I'm 94.3% joking on that :)

I have an app I'm working on that translate from C# to D. It consists of a several dozen RegExp rules. It's darn near impossible to figure out whats going on without good tools:

\s*(((abstract)|(static))\s+)?(((public)|(private))\s+)?((class)|(interface)) ([a-zA-Z_][a-zA-Z0-9_]*)[^{]*({.*})?.*
^\s*((public)|(private))?(\s+((static)|(virtual)))?\s+([a-zA-Z_][a-zA-Z0-9_]*(!\(.*\))?)\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*(~|{)

but it would be even worse without RegEx

BCS
You *ARE* just using Regex's for Lexing and not Parsing, right?...RIGHT?
Dylan Lacey
+3  A: 

@BCS

While a Regex can be hard to understand at first glance, it's often better than the alternative of 20 substring/indexof function calls. Often when somebody tries to do something without regex, it ends up being quite rigid, and often misses out on a lot of edge cases because it would have made the code too difficult to write. Fixing a bug, or adding in and extra case that also matches the pattern can often be very difficult when Regex is not being used.

Kibbee
-1 Not helpful, doesn't answer the question.
Adriano Varoli Piazza
+1  A: 

I've used a the grouping feature of Regex to map out "columns" in a log file that doesn't have a space or comma delimited format. I then use the grouping results to suck the data into a DataTable that is bound to a grid for log file viewing/searching/sorting.

Does that count?

Dillie-O
A: 

@Dillie-O It'll count when you paste in the regex you used. :-)

travis
+61  A: 

This is not something useful, but is surely the most interesting regexp I have come across, A regexp that tests for prime numbers:

/^1?$|^(11+?)\1+$/

You can test it in perl by doing

 $perl -e 'print "Prime\n" if (1 x shift) !~ /^1?$|^(11+?)\1+$/' 19
 $Prime
 $perl -e 'print "Prime\n" if (1 x shift) !~ /^1?$|^(11+?)\1+$/' 20
 $

I think it is by Abigail from the perl community.

Edit: For the people having hard time believing it works, here is an explanation:

In the test example, (1 x shift) creates a string of ones whose length is the number we are testing for primeness

The regexp actually finds if the number has any divisors other than one or itself => not prime.

^1?$ will return true for 1 which is not prime.

^(11+?)\1+$ which is more complicated does the following.

^(11+?) means start by trying to see if it can find 11 at the beginning of the string, \1 means repeated + means at least once and matches the string from start to end ^ and $ => this is a check if the length is divisible by two.

ex: 8 (11111111) would match 11 repeated 4 times

9 (111111111) would yield 11 repeated 4 times but with a final 1 left at the end, so the string is not matched form beginning to end.

The neat part is that when it cannot match,it backtracks and tries to match 111 repeated in the same way => check if divisible by three ...

9 (111111111) will be matched with 111 repeated 3 times exactly from start to end.

And so forth, if any match is found the number has a divisor and is not prime otherwise the number has no divisors and is prime.

A number like 7 would never match. 11,111,1111,11111,111111 leave some unmatched ones at the end and 1111111 is not "repeated at least once", so the regexp never matches -> no divisors -> number is prime

Pat
Ummm, I'm with OysterD. Congratulations on a successful hoax.
harpo
If you don't beleive it works, Why not just try it?
Pat
I stand corrected. I see the *length* of the string is tested for primeness.for (var n = 0; n < 50; n++){ var s = ""; for (var i = 0; i < n; i++) s+="1"; console.log( n + ": " + ( /^1?$|^(11+?)\1+$/.test(s) ? "composite" : "
harpo
that's right the length of the string is the number being tested for primness by the regexp. I will edit the post to reflect that.
Pat
Wow, that is great. Thanks for posting this :)
Kiv
+5  A: 

/(bb|[^b]{2})/

Edit: It's Shakespeare ;p

Akira
I think our friend Akira here stole this idea from thinkgeek.com -- search for their regex shakespear tshirt.
Justin Standard
A: 

@Akira isn't that the same as /(..)/ ? How was it used?

Edit: LOL, that's f'n great!

travis
+1  A: 

Well, for an old log for WebCT (now blackboard's) log file, I used:

^\[(.*)\]\t\[(.*)\]\t\[(.*)\]\t\[(.*)\]\t(.*)$

nothing too fancy, since they used the [] on occasion

I still need to refine it a smidge more, but this one is allowing me to suck data out of my IIS log files:

(\S+)\:(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s(\S+)\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(GET|POST)\s(.*)--\s(\d+)\s-\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(\S+)\s(\d+)\s(\d+)\s(\d+)

If you want to mess with it yourself, check out a little library I created called fotelo.

Dillie-O
A: 

@Pat:

There's no such thing as a regexp testing for primality. Note that it has (quite) recently been proven that PRIMES is in the P class (thanks to the AKS algorithm). It is not, however, known to be regular (yet!).

Was it a joke?

OysterD
Edited the post to explain the regexp for the unbelievers :-)
Pat
Regular expressions as in Perl are more powerful than regular expressions as in CS theory, due to backreferences in Perl. Also, note that PRIMES is not even context-free, yet alone regular (even in a unary representation as in @Pat's post).
A. Rex
CS also supports backreferences.
Sylverdrag
+2  A: 

Years ago I wrote a mailing address parser using regular expressions. This was for passing data from a web front end that had the typical "address line 1" and "address line 2" form fields and it had to be converted into "street" "number" "po box" ... fields.

It had a list of possible expressions and for each one what the groups were associated with. It went through the list looking for one that matched without missing any of the address.

It was a strangely fun coming up with all the options and test cases to try and break it. In the end I learned way more about addresses than I ever wanted to know. The USPS has a good site about address formats. Before this I had never even heard of a "star route" address.

John Meagher
+2  A: 

Why are we bragging about doing things with regular expressions that are difficult to understand? Sometimes regular expressions are the right tool, but many of these posts are using them as a clever novelty instead of best and most obvious way to solve a problem. This website is supposed to encourage good, readable coding practices.

Christian Oudard
+3  A: 

I once copied the BNF Syntax off an RFC and made a set of Regexes that "matched" the protocol. (?Expression) for a non-grouping group did wonders. The regex did all the parsing for me.

The exact code is on another machine, but it was something like:

$DIGIT = "[0-9]";

$HTTP = "$request";
$request = "(?$httpversion $operand $URI $requestheader)";
$httpversion = "HTTP/$DIGIT\.$DIGIT";
$operand = "(GET|POST)";
$URI = "($protocol://$optionalcreds$hostnameip$optionalport(/$absolutepath))"
## etc

$packet = read(SOCK);
$packet =~ /$HTTP/;

That example is for HTTP, RFC 2616, using the BNF from Syntax and Protocol sections of the document. Some BNFs are written so that you can just cut and paste them into a regex.

davenpcj
+5  A: 

ObQuote:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. — Jamie Zawinski

+1  A: 

A great post about useful regular expressions: "8 Regular Expressions You Should Know"

http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/

(includes matching e-mail, url, ip address, html tag, etc)

JuanZe
A: 

A regex to check for strong passwords:

This one will validate a password with a length of 5 to 10 alphanumerical characters, with at least one upper case, one lower case and one digit:

^[a-zA-Z0-9]{5,10}(?<=[A-Z].*)(?<=[a-z].*)(?<=[0-9].*)$

It doesn't work in all regex implementations because it relies on variable-length look-behinds.

As MizardX points out, using look-ahead may work for in most cases:

^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])[a-zA-Z0-9]{5,10}$
Philippe Leybaert
Just convert it to look-aheads instead. ^(?=...)(?=...)[...]{5,10}$
MizardX
A: 

I've seen a regular expression matching multiples of 7. But I can't find the link. An overkill? Yes. In fact you can match multiples of any number.

If you try to find out whether or not 2048582930401385720939528 is a multiple of 7 with pencil and paper, you're being a finite automaton. "finite" because the number of stuffs you must keep track of at any moment of during your calculation doesn't grow, which is why you can check if the number 6783...84 (a 100 meter long number written on a road) is divisible by 7 by just walking along the road and keeping calculating. And there is a theorem saying that anything recognizable by a finite automaton is also recognizable by a regular expression. A proof is in the book "Theory of Codes", available online.

RamyenHead