tags:

views:

83

answers:

3

Does anyone know of a library - ideally Python, that can have a stab at pulling dates out of text?

"Shall we go to the library today" -> 21 Jan 10 "Starting on the 1st of January" -> 1 Jan 10 "Anytime between 3nd and 5th of Feb 2009" -> 3 Feb 09, 5 Feb 09

It's a tough problem and probably why I havn't found anything! Already using NLTK by the way if that helps.

+4  A: 

Looks like this module is what you are looking for: parsedatetime

Nadia Alramli
You'll probably have to tokenize your lines before moving them to the parser.
Adam Matan
Should have added to question - have been trying with this but it is easily fooled and then you have the problem of working out if the result is valid! Thanks for suggestion though.
+1  A: 

Thanks for the contributions - in the end I followed up one of the comments, that led to pyparsing, which led to the beginnings of a solution. many thanks all.

Have posted the work in progress, two pyparsing snippets of code here http://pbjots.blogspot.com/2010/01/using-pyparsing-to-extract-dates-from.html in case they help anyone.

+1  A: 

The PyParsing site has a little bonus script for parsing time expressions. I would say that is worth a look for you!

Edit: I see you already ended up there as I was typing my suggestion. Good luck to you!

jathanism
Thanks - I'll mark this as the answer!
Well I'll be! Thanks for the selection!
jathanism