Under what situations are regular expressions really the best way to solve the problem?

tags:

regex

views:

200

answers:

+3 Q:

Under what situations are regular expressions really the best way to solve the problem?

I'm not sure if Jeff coined it but it's the joke/saying that people who say "oh, I know I'll use regular expressions!" now have two problems. I've always taken this to mean that people use regular expressions in very inappropriate contexts.

However, under what circumstances are regular expressions really the best answer? What problems are they really the best or maybe only way to solve a situation?

Regular expressions are a great way to parse text that doesn't already have a parser (i.e. XML) I have used it to create a parser for the mod_rewrite syntax in the .htaccess file or in my URL Rewriter project http://www.codeplex.com/urlrewriter for example

Nick Berardi 2008-10-23 15:59:22

they are really good when you want to be more specific than "*" or "?" like "3 letters then 2 numbers then a $ sign then a period"

The quote is from an anti-Perl rant from Jamie Zawinski. I think Perl used to do regex really badly but now it seems to be a standard engine for a lot of programs.

But the same sentiment still applies. If you don't know how to use regex, you better not try something real fancy other wise you get one of these tags too (see bronze list) ;o)

http://stackoverflow.com/users/730/keng

Keng 2008-10-23 16:00:15

They are good for matching or finding text that takes a very specific and simple format. By "simple" I mean not nested and smaller than the entire html spec, for example.

Joel Coehoorn 2008-10-23 16:04:01

+4 A:

RexExprs are good for:

Text Format Validations (email, url, numbers)
Text searchs/substitution.
Mappings (e.g. url pattern to function call)
Filtering some texts (related to substitution)
Lexical analysis during parsing.

Null303 2008-10-23 16:06:25

+4 A:

They can be used to validate anything that have a pattern like :

Social Security Number
Telephone Number ( 555-555-5555 )
Email Address ([email protected])
IP Address (but it's more complex to make sure it's valid)

All those have patterns and are easily verifiable by RegEx.

They are difficultly used for entry that have a logic instead of a pattern like a credit card number but they still can be used to do some client validation.

So the best ways?

To sanitize data entry on the client side before sanitizing them on the server.
To make "Search and Replace" of some strings that contains pattern

I'm sure I am missing a lot of other cases.

Maxim 2008-10-23 16:06:29

I voted this up, despite the inclusion of email address as an example... Email addresses are VERY complex to validate and a regex to do it is going to be neir impossible to decipher.

Brian Knoblauch 2008-10-23 16:39:08

They are primarily of value for highly structured text parsing. If you used named groups (and option in most mature regex systems), you have a phenomenally powerful and crisp way to handle the strings.

Here's an example. Consider that netstat in its various iterations on different linux OSes, and versions of netstat can return different results. Sometimes there is an extra column, sometimes there is a shift if the date/time format. Regexes give you a powerful way to handle that with a single expression. Couple that with named groups, and you can retrieve the data without hacks like:
1) split on spaces
2) ok, the netstat version is X so add I need to add 1 to all array references past column 5.
3) ok, the netstat version is Y so I need to make sure that I use multiple array references for the date info.

YUCK. Simple to fix in a Regex :-)

torial 2008-10-23 16:06:31

ansaurus

tags:

views:

answers:

Under what situations are regular expressions really the best way to solve the problem?

related questions