views:

788

answers:

22

I have heard of regular expressions and only seen use cases for a few things so I don't think of using them very often. In the past I have done a couple of things and it has taken me hours to do. Later I talk to someone and they say "here is how to do it using a regular expression".

So what are things for which you have used Regular Expressions? If I get more examples then maybe I can begin to know when to look for and use them.

+5  A: 

The most common use cases are to find strings that match a pattern. Typically searching is combined with replacing text that matches the pattern, with another string.

For example, the following expression will match whitespace (just spaces and tabs in this particular case) at the beginning of a line.

^[ \t]+

This might be useful if you wanted to trim that whitespace off.

Bill the Lizard
curious, considering regex are a "pattern matching language" ;)
hop
"find strings" are the operative words, not "match a pattern".
Bill the Lizard
+1  A: 

Stack Overflow is in fact a good place to find use cases:

http://stackoverflow.com/questions/tagged/regex

Think of regex as glob (you know, * ? {a,b,c} [abc]) on steroids.

Zach Scrivena
A: 

validating an email address is something i always use regex for i don't even want to think about trying to do something like that another way

mrinject
+11  A: 

Many things. Including:

  • Examining command lines
  • Parsing user input
  • Parsing various text files
  • Examining web server logs
  • Examining test results
  • Finding text in emails
  • Reading configuration files

When learning regular expressions, it may be helpful to also learn restraint. You might be tempted, like me, to see regular expressions as a solution to far too many problems.

Paul Beckingham
Obligatory "...and then you have two problems."
Bill the Lizard
http://regex.info/blog/2006-09-15/247
hop
parsing command lines is a _bad_ example
hop
@hop: Agreed. REs are appropriate for only the simplest of parsing problems.
dmckee
+2  A: 

Well, any time you need to match something when just matching a verbatim word won't work. Renaming files, search-and-replace in code, you name it.

A traditional example might be when you want to find all occurrences of, say, phone numbers in a file. Searching for individual numbers is obviously not going to work, and just searching for dashes is going to probably be ambiguous. Much better to say "find all occurrences of 3 digits followed by a dash followed by 4 digits" (kept basic for example purposes; in reality you may want to match area code, different delimiters, etc.)

Another neat thing about regexps is they let you use part/all of what you searched for in your replacement. Thus if you wanted to replace all of area code 555 with something else, you can, while maintaining the rest of the phone number intact.

J Cooper
A: 

In ASP.NET if you use user controls or master pages, even if you name your controls uniquely, they get mangled by the framework. I wrote a little wrapper around the prototype $ function to allow me to get mangled controls in javascript despite the name mangling. It uses a regular expression to search the DOM for controls that end with the appropriate name.

I also use it heavily in client/server-side validation of inputs that need to match specific input patterns.

tvanfosson
+1  A: 

Pulling out pieces of string input. For example,

  • "Screen scraping" web pages either by matching directly on the HTML or (more useful) ASCII output from a tool like w3m, e.g., for finding out when a football game is over so my computer can stop recording it

  • Splitting an email header into tag and value, for example, to help identify spam

  • Pulling first name, last name, and email address out of student records so that in grade books I can identify a student by last name if it unique and by "last, first" if disambiguation is required

Regular expressions were the first widely deployed string-processing tools, but these days I often prefer something based on Parsing Expression Grammars, like the LPEG pattern matcher.

Norman Ramsey
parsing html is the canonical bad example(tm)
hop
Bad, but since HTML never parses, also useful :-)
Norman Ramsey
+3  A: 

The crux of what I would use regex for is:

  1. Validation of input
  2. Cleaning of input
  3. Restructuring of input
  4. Looking for substrings within input
OJ
+3  A: 

Input validation routines serve as a first line of defense for a Web application. Regular Expressions are a great and robust way to validate input.

If you make unfounded assumptions about the type, length, format, or range of input, your application is unlikely to be robust. Input validation can become a security issue if an attacker discovers that you have made unfounded assumptions. The attacker can then supply carefully crafted input that compromises your application by attempting SQL injection, cross-site scripting, and other injection attacks. To avoid such vulnerability, you should validate text fields (such as names, addresses, tax identification numbers, and so on) and use regular expressions.

For example, instead of just adding a required field for a last name input you use the following expression to allow only uppercase and lowercase characters and a few special characters that are common to some names.

^[a-zA-Z''-'\s]{1,40}$

Craig McKeachie
+1  A: 

I use them for basically anything where I need something more than exact string matching, but I don't care enough about performance or maintainability to do anything that would require more than a few lines of code.

dsimcha
+2  A: 

For the self-learners, everything you could want to know about Regular Expressions and more: http://www.regular-expressions.info/

It might help to understand why they are called "Regular Expressions". The 'regular' part of it means that there is some expectation of a pattern. The 'expressions' part of it implies that they are, in a sense, a mathematical representation of the text, and this in turn allows you to extract information from the text.

For example, I wrote a module that used regular expressions to split phone numbers into their constituent pieces - eg the country, the area code, the exchange, and the station. This sounds easy for a human, but for a computer it's not so easy if you consider that there are so many ways to write phone numbers. You can do +1(407)555-1234 or 407-555-1234 or 555-1234 (7-digit dialing) or 1.407.555.1234 or 4075551234. Using regular expressions helps to abstract the processing of the text when there are certain things that you are trying to extract from text.

Michael Bray
+2  A: 

Years ago before the iPhone, web browsers on Windows Mobile and Palm PDAs were really very limited. Even CSS was off limits unless you had the very latest version of Windows Mobile. Because I had a geeky PDA with a fancy wireless add-on card, I wanted to surf the web from it instead of buying a laptop, and so I made a portal website. One of the things I made was a page that could do transformations, removals and replacements on certain parts of the HTML, either generic operations like removing all the images, or site specific stuff. This was pretty much all done with regex.

Marc Charbonneau
+9  A: 
Mark Robinson
Cool. Hotlinking, even without an attribution.
gnud
For those wondering, this comic is from http://www.xkcd.org
gnud
A: 

Parsing dice notation ("2d6", "3d4+10", etc) to create a Dice object in ruby. (Not sure if this code is the "perfect" way to do it, as I am still learning Ruby).

def Dice
    def self.parse(str)
        match = /^(\d+)d(\d+)([\+|\-]\d+)?$/.match(str)
        amt, sides, mod = match.captures.map {|c| c.to_i }
        Dice.new(amt, sides, mod)
    end
end

Very nice and easy.

Mark A. Nicolosi
+1  A: 

In my web forms, I often use regex to validate what the user typed into a text field or similar. An email address needs to be some non-whitespace characters followed by the "@" character, followed by some more non-whitespace characters, followed by a period character, etc. Dates need to meet one of a few allowable formats (1/23/2008, 1/23/08) in order for my code to figure out exactly what date was entered. Etc.

Kurt W. Leucht
+2  A: 

I use them quite frequently, probably because I'm mostly in a linux environment and have easy access to them.

  • Searching for things in an editor, especially when I know two parts on a line but not what is inbetween (please excuse the extraneous whitespace)
    • Where is the reval function that takes a widget? "reval.*\<widget\<"
    • Where is my_obj assigned to? "\<my_obj\>.*="
  • To search and replace in order to produce a modification of a data file: i.e., set all the delivery volumes to one "#<volume>[-0-9.]+</volume>#<volume>1.0</volume>#g"
  • To munge output to fit on the screen (removing whitespace or uninteresting fields).
  • To munge data files into another format, such as taking log files and producing a file for gnuplot which graphs performance data.
  • For programmatic uses, such as pattern matching a data value's name in order to handle it differently if it matches certain criteria most easily expressed with a regular expression.

After using regexes I hate the windows "Find" box because it is so limited.

As another user answered, regular expressions are essentially more powerful globbing, but they go way beyond that. You don't need to read "Master Regular Expressions" to use them, but I do recommend the book. I'm sure there are plenty of resources on the internet, such as here, although I can't vouch for any of them.

Another advantage to using regular expressions (whether in code or on the command line) is that they have been heavily optimized. Grep and DFA parsers in particular are almost certainly faster than what you would write on your own... and more likely to be correct the first time. Don't reinvent the wheel when you have such a nice one handy.

Mark Santesson
+1: Exactly how I use them most frequently, and if they don't work as you expect, no harm done. I can rip thru a data file making changes for testing in vi in no time.
DCookie
A: 

Simply put, regular expressions are useful any time you need to understand or manipulate strings. It is particularly easy to reach for regular expressions when you are about to write a multi-line text processing code block and you realize that regex could do it in one line.

John Fisher
A: 

Regular Expressions are great for small text searchs, pattern matching and substitions in small&medium sized texts. For instance one of the place that I used RE's are form field validations.

If you don't mind about performence you can really quick and dirty scripts for doing anything with texts.

systemsfault
A: 
I've regularly used it to take text output from some programs which have a lot of information and abstract some of that information to be saved in databases, spreadsheets or even as a new text file.
Also, I've used it in a program to read text files that initialize the program's variables.
A: 

I've also used them for building random data that conforms to existing validation rules.

Goran
A: 

For the fun of watching a co-worker try to decipher it.

musicfreak
A: 
  1. trolling log files for exceptions or validation lines (ie "Subsystem A started..."), etc.
  2. replacing text, (ie, in source files, to quickly become Sysout statements)
  3. explaining to co-workers how powerful regex is.
akf