tags:

views:

198

answers:

13

I'm thinking of presenting questions in the form of "here is your input: [foo], here are the capture groups/results: [bar]" (and maybe writing a small script to test their answers for my results).

What are some good regex questions to ask? I need everything from beginner questions like "validate a 4 digit number" to "extract postal codes from addresses".

+3  A: 

There's a bunch of examples of various regular expression techniques over at www.regular-expressions.info - everything for simple literal matching to backreferences and lookahead.

James Kolpack
A: 

H0w about extract first name, middle name, last name, personal suffix (Jr., III, etc.) from a format like:

Smith III, John Paul

How about Reg Ex to remove line breaks and tabs from the input

HLGEM
+1  A: 
  • Validate phone numbers (extract area code + rest of number with grouping) (Assuming US phone number, otherwise generalize for you style)
  • Play around with validating email address (probably want to tell the students that this is hugely complicated regular expression but for simple ones it is pretty straight forward)
Jesse
A: 

I would start with the common ones:

  • validate email
  • validate phone number
  • separate the parts of a URL
Gabriel McAdams
ouch! validate email would be tough
dotjoe
So you want beginning regex students to come up with something like this: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html ?
Peter Ajtai
A: 

Be cruel. Tell them parse HTML.

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

The Matt
Ouch! That hurts.
JCasso
I think discussing why this is a bad idea would be good for students.
Chase Seibert
Note that parts of HTML are regular and thus can be parsed with regular expressions.
Gumbo
@Gumbo Out of interest, which parts?
alex
@alex: One example are the tags. The start tags and end tags that are used to markup the elements are regular.
Gumbo
@Gumbo So stripping HTML tags from a string is a valid use of a regex?
alex
@alex: Yes, tags can be removed but elements (pair of start tag and end tag) not.
Gumbo
+4  A: 

A few that I can think off the top of my head:

  1. Phone numbers in any format e.g. 555-5555, 555 55 55 55, (555) 555-555 etc.
  2. Remove all html tags from text.
  3. Match social security number (Finnish one is easy;)
  4. All IP addresses
  5. IP addresses with shorthand netmask (xx.xx.xx.xx/yy)
Kimvais
A: 

Are you teaching them theory of finite automata as well?

Here is a good one: parse the addresses of churches correctly from this badly structured format (copy and paste it as text first) http://www.churchangel.com/WEBNY/newhart.htm

Hamish Grubijan
A: 

I'm a fan of parsing date strings. Define a few common data formats, as well as time and date-time formats. These are often good exercises because some dates are simple mixes of digits and punctuation. There's a limited degree of freedom in parsing dates.

S.Lott
A: 

Try to think of some tests that don't include ones that can be found with Google.

Asking a email validator should pose no trouble finding..

Try something like a 5 proof test.

Input 5 digit. Sum up each digit must be dividable by five: 12345 = 1+2+3+4+5 = 15 / 5 = 3(.0)

rdkleine
I'd like to see that regex, if you don't mind. How do you check if the *sum* is dividable by 5?
Kobi
Regex does not count. Numerical values have no meaning in regex: they're just strings, just like 'a', 'b' and 'c'.
Bart Kiers
A: 

Just to throw them for a loop, why not reword a question or two to suggest that they write a regular expression to generate data fitting a specific pattern like email addresses, phone numbers, etc.? It's the same thing as validating, but can help them get out of the mindset that regex is just for validation (whereas the data generation tool in visual studio uses regex to randomly generate data).

Mayo
+1  A: 

To keep things a bit more interesting than the usual email/phone/url stuff, try looking for more original exercises. Avoid boredom.

For example, have a look at the Forsysth-Edwards Notation which is used for describing a particular board position of a chess game.

Have your students validate and extract all the bits of information from a string like this:

rnbqkbnr/pp1ppppp/8/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R b KQkq - 1 2

Additionaly, have a look at algebraic chess notation, used to describe moves. Extract chess moves out of a piece of text (and make them bold).

1. e4 e5 2. Nf3 Black now defends his pawn 2...Nc6 3. Bb5 Black threatens c4

Geert
+1 for awesomeness :)
Kimvais
+1  A: 

regexplib.com has a good library you can search through for examples.

eidylon
A: 

Rather than teaching examples based from the data set, I would do examples from the perspective of the rule set to get basics across. Give them simple examples to solve that leads them to use ONE of several basic groupings in each solution. Then have a couple of "compound" regex's at the end.

Simple: s/abc/def/

Spinners and special characters: s/a\s*b/abc/

Grouping: s/[abc]/def/

Backreference: s/ab(c)/def$1/

Anchors: s/^fred/wilma/ s/$rubble/and betty/

Modifiers: s/Abcd/def/gi

After this, I would give a few examples illustrating the pitfalls of trying to match html tags or other strings that shouldn't be done with regex's to show the limitations.

SDGator