tags:

views:

57

answers:

1

I'm really new to Regex and working hard, but this has gone beyond simple in my opinion. I understand how to create the Regex object in .Net but I'm not sure how to use it for my specific purpose once I have a pattern.

Regex regex = new Regex("(at ){0,1}[0-9]{1,2}(:[0-9]{2}){0,1}(?:[ap]m?){0,1}");

I need to be able to take a sentence like "Dinner will be at 9pm at your favorite restaurant" and get the values { "Dinner will be at your favorite restaurant", "9pm " } (removing "at " if it exists).

Complete(?) test cases:

"Dinner at 9pm"            { "Dinner", "9pm" }
"Dinner at9pm"             { "Dinner", "9pm" }
"Dinner 9pm"               { "Dinner", "9pm" }
"Dinner 9p"                { "Dinner", "9pm" }
"Dinner 9a"                { "Dinner", "9am" }
"Dinner 9pZ"               { "Dinner 9pZ", "" }
"Dinner 9aZ"               { "Dinner 9aZ", "" }
"Dinner at 9"              { "Dinner", "9" }
"Dinner at 9:15pm"         { "Dinner", "9:15pm" }
"Dinner at 9:15"           { "Dinner", "9:15" }
"Dinner at9:15"            { "Dinner", "9:15" }
"Dinner at 9pm in Seattle" { "Dinner in Seattle", "9pm" }
"Dinner at9pmin Seattle"   { "Dinner in Seattle", "9pm" }
"Dinner at9in Seattle"     { "Dinner in Seattle", "9" }
"Dinner 9in Seattle"       { "Dinner 9in Seattle", "" }
"9pm Dinner"               { "Dinner", "9pm" }
"The 9pm Dinner was good"  { "The Dinner as good", "9pm" }
"Dinner at 9pmpm"          { "Dinner pm" "9pm" }
"Dinner at 9:15pmpm"       { "Dinner pm" "9:15pm" }

(just for further clarification, a number without a ":" or "am/pm" must be preceded by "at" unless it is the first number listed. "am" and "pm" require either an ending in "M" or " ".)

Beyond the test cases, I don't understand the syntax needed to get back the values I need using the regex object (list in the brackets above).

+4  A: 

A regex for doing this would be complicated and it also wouldn't return the results in the expected order in cases such as "9pm Dinner". If you're willing to spend a little time, it might be simpler to write a basic recursive-descent parser. Each word in the input would form a token, and you can easily come up with rules based on your requirements. For example:

event: "Dinner" time |
       "Dinner" location |
       "Dinner" time location |
       "Dinner" location time

time:  "at" number ":" number "am"/"pm"
       /* etc. */

You then write a small function for each non-terminal (event, time, location etc.) that will do its part and return the result.

As you see, your requirements already bring up so many possibilities that a regex would only make it extremely confusing, if at all possible.

casablanca
+1 because regex isn't the solution to all problems
Falmarri
Darn it, I was hoping it was the be all end all... or at least viable.
Erik Philips
@Erik: In case you're wondering, the real "be all end all" is a [Turing machine](http://en.wikipedia.org/wiki/Turing_machine). Regular expressions are by far the most restricted in their applications, while most programming languages (as well as your own requirement) can be parsed by what is known as a [context-free grammar](http://en.wikipedia.org/wiki/Context-free_grammar).
casablanca