tags:

views:

937

answers:

11

I don’t understand or see the need for regular expressions.

Can some explain them in simple terms and provide some basic examples where they could be useful, or even critical.

+23  A: 

What regular expressions are used for:

Regular expressions is a language in itself that allows you to perform complex validation of string inputs. I.e. you pass it a string and it will return true or false if it is a match or not.

How regular expressions are used:

  • Form validation, determine if what the user entered is of the format you want
  • Finding the position of a certain pattern in a block of text
  • Search and replace where the search term is a regex and what to replace is a normal string.

Some regular expression language features:

  • Alternation: allows you to select one thing or another. Example match only yes or no.

    yes|no

  • Grouping: You can define scope and have precedence using parentheses. For example match 3 color shades.

    gr(a|e)y|black|white

  • Quantification: You can quantify how much of something you want. ? means 1 or 0, * means 0 or more. + means at least one. Example: Accept a binary string that is not empty:

    (0|1)+

Why regular expressions?

Regular expressions make it easy to match strings, it can often replace several dozen lines of source code with a simple small regular expression string.

Not for all types of matching:

To understand how something is useful, you should also understand how it is not useful. Regular expressions are bad for certain tasks for example when you need to guarantee that a string has an equal number of parentheses.

Available in just about all languages:

Regular expressions are available in just about any programming language.

Formal language:

Any regular expression can be converted to a deterministic finite state machine. And in this same way you can figure out how to make source code that will validate your regular expression.

Example:

[hc]+at

matches "hat", "cat", "hhat", "chat", "hcat", "ccchat", and so on, but not "at"

Source, further reading

Brian R. Bondy
+28  A: 

Use them where you need to use/manipulate patterns. For instance, suppose you need to recognise the following pattern:

  • Any letter, A-Z, either upper or lower case, 5 or 6 times
  • 3 digits
  • a single letter a-z (definitely lower case)

(Things like this crop up for zip code, credit card, social security number validation etc.)

That's not really hard to write in code - but it becomes harder as the pattern becomes more complicated. With a regular expression, you describe the pattern (rather than the code to validate it) and let the regex engine do the work for you.

The pattern here would be something like

[A-Za-z]{5,6}[0-9]{3}[a-z]

(There are other ways of expressing it too.) Grouping constructs make it easy to match a whole pattern and grab (or replace) different bits of it, too.

A few downsides though:

  • Regexes can become complicated and hard to read quite quickly. Document thoroughly!
  • There are variations in behaviour between different regex engines
  • The complexity can be hard to judge if you're not an expert (which I'm certainly not!); there are "gotchas" which can make the patterns really slow against particular input, and these gotchas aren't obvious at all
  • Some people overuse regular expressions massively (and some underuse them, of course). The worst example I've seen was where someone asked (on a C# group) how to check whether a string was length 3 - this is clearly a job for using String.Length, but someone seriously suggested matching a regex. Madness. (They also got the regex wrong, which kinda proves the point.)
  • Regexes use backslashes to escape various things (e.g. use . to mean "a dot" rather than just "any character". In many languages the backslash itself needs escaping.
Jon Skeet
another use is file searching which I use regularly
CodeMonkey
+2  A: 

They are a bit tricky, but extremely powerful and worth learning. The web is full of tutorial and examples, start for example from here and look at the examples here.

Joonas Pulakka
A: 

To give you some examples:

  • Email Address

  • Password requires at least 1 alphabet and 1 digit

  • How can you acheive these requirements?

  • The best way is to use regular expression.

Read the following links to learn more:

How To: Use Regular Expressions to Constrain Input in ASP.NET http://msdn.microsoft.com/en-us/library/ms998267.aspx

Billy
A: 

Whenever you've got some pattern to find in a lot of textual data or if you want to check that a string is in a certain format.

For example an email address...

The code for checking for an at symbol and the presence of a valid domain will look quite big where you could just use a regular expression and have an answer in 2 lines of code.

Regex r = new Regex("<An Email Address Regex>");
bool isValidEmail = r.IsMatch(MyInput);

Other examples would be for checking numbers are in the correct format before parsing them into integers etc.

Rob Stevenson-Leggett
Not the best example, given that the strictly correct regexp to validate an email address is absolutely gigantic: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
Michael Borgwardt
That is one insane regex. I'd rather let people use invalid email addresses than use that one.
Mussnoon
That's a stupid argument though. The regex code will work and it doesn't really matter that it's an unmaintainable mess of a regex because you can unit test the hell out of it. It's a solved problem. Your roll your own code will almost certainly miss an edge case.
Rob Stevenson-Leggett
+1  A: 

Regular expressions are a very concise way to specify most pattern-matching and -replacement problems, and regexp engines can be very highly optimized.

If you wanted to do the same job as even a relatively simple regexp, you'd have to write a lot of code, which probably would contain a number of bugs, be hard to understand and perform badly.

Whereas doing the same with a regexp is much shorter, almost certainly performs as well as is technically possible, and is easier to understand to anyone familiar with regexpes (though it should be commented in either case)

Michael Borgwardt
+1  A: 

The email example is actually a bad example for regular expressions. Regexes can be used, but the resulting expression (for example this one which doesn't handle "John Doe [email protected]" style addresses) is hugely complicated - take a look at the email address specification and you'll see why...

However regexes are very useful in a host of other situations, extracting ip addresses from text, tags from html etc. Finding all versioned files would be another example. Something along the lines of:

my_versioned_file_(\d{4}-\d{2}-\d{2}).txt

will match any filenames of the format my_versioned_file_2009-02-26.txt and pull out the date as a captured group (the part wrapped in "()") for you to further analyse.

No regexes are not necessary, but they can save a world of time in writing a hand rolled parser for something a regex can easily achieve.

Ant
+3  A: 

If I could direct the OP to some of the answers/comments on one of my own questions: http://stackoverflow.com/questions/519929/how-important-is-knowing-regexs

annakata
A: 

Jon and Sqook gave a fine explanation and definition of Regular Expressions, and for simple problems it is pretty understandable, but if you use it for complex problems regular expressions can be a &$@( (at least for me ;-))

I use Expresso a lot to help me build complex regular expression code.

http://www.ultrapico.com/Expresso.htm

It has a build in library with expressions you can use, a design mode where you can build your code and a test mode where you can test and validate the code. It helped me build and understand complex expressions better!

Goodluck!

Erik404
A: 

Some practical real world usages:

Finding abstract classes that extend JUnit's TestCase:

abstract\s+class\s+\w+\s+extends\s+TestCase

This is useful for finding test cases that cannot be instantiated and will need excluding from an ant build script that runs test cases. You cannot search for regular text because you don't know the class names in advance. hence the \w+ (At least one word character).

Finding running bash or bourne shell scripts:

 ps -e | grep -e " sh| bash"

this is useful if you want to kill them all or something, if you did a search for just sh you'd not get the bash ones and have to run the command again for bash scripts. Again, more serviceable than perfect, but nearly no regex you write on the fly will be.

It's not perfect, but most regexes won't be, or they'll take so long to write they're not worth it. The ones you perfect are the ones you commit as part of some sort of validation or built application.

Trampas Kirk
A: 

Example of critical use is JavaScript:
If you need to do search or replace on a string, the only matching you can do is a regular expression. It's in the JavaScript API on those string methods...

Personally, I mostly use regular expressions only when I need some advanced matching in some automated find/replace in a text editor (TextPad or Visual Studio). The most powerful feature in my view is the ability to match a pattern that can be inserted in the replace.

awe