views:

174

answers:

1

I need to parse strings containing time spans such as:

  • Thursday 6:30-7:30 AM
  • December 30, 2009 - January 1, 2010
  • 1/15/09, 7:30 to 8:30 PM
  • Thursday, from 6:30 to 7:30 AM
  • and others...

added

  • 6:30 to 7:30

and date/times such as most any cases that Word's insert->date can generate

As I'd be extremely surprised if anything out there covers all the cases I need to cover, I'm looking for grammars to start from.

+5  A: 

Ok, the following grammar parses anything in your example:

DTExp        = Day, ['-', Day]
Day          = DayExp, [[','], ['from'], TimeRange]
DayExp       = WeekDay
             | [Weekday], Month, DayNumber, [[','], YearNumber]
             | [Weekday], MonthNumber, '/', DayNumber, ['/', YearNumber]
TimeRange    = Time, [['-'|'to'] Time]
Time         = HourNumber, ':', MinuteNumber, ['AM'|'PM']
WeekDay      = 'monday' | 'tuesday' | ...
Month        = MonthNumber | MonthName
MonthName    = 'january' | 'february' | ...
DayNumber    = Number
MonthNumber  = Number
YearNumber   = Number, ['AD'|'BC']
HourNumber   = Number
MinuteNumber = Number

There is a slight problem in the grammar. If a DayExp is read, followed by a Time, and a '-', then you could expect another DayExp or another time. But this is solved by a lookahead, because if it is a time, a number is followed by a ':'.

Lets try to construct a parse tree:

Thursday  6    :  30     -   7    :   30    AM
    |     |        |         |         |     |
WeekDay Number : Number  - Number : Number   |
    |     -----|----         -----|----------- 
    |         Time       -       Time
    |           ---------|---------
 DayExp              TimeRange
    ----------|-----------
             Day
              |
            DTExp
Gamecat
Did you just generate that or is it from some other source (link?). Just curious.
BCS
No, I generated that. I almost dream in grammars ;-).
Gamecat
@Gamecat - you need a hobby! :P
warren
Lol, I already have enough. But this I can do as the kids are asleep (at least pretending to be asleep). The others make too much noise.
Gamecat
"1/2/09 from 6:30 to 7:30 - 3/16/10" Every grammar has it's insanities.
BCS