views:

91

answers:

2

How can I extract the date from a string like "monkey 2010-07-10 love banana"? Thanks!

+5  A: 

If the date is given in a fixed form, you can simply use a regular expression to extract the date and "datetime.datetime.strptime" to parse the date:

match = re.search(r'\d{4}-\d{2}-\d{2}', text)
date = datetime.strptime('%Y-%m-%d').date()

Otherwise, if the date is given in an arbitrary form, you can't extract it easily.

lunaryorn
What if it is in European format, such as 20/01/1980 meaning "Jan 20 1980"? What if months/days/years fall outside of reasonable range?
Hamish Grubijan
My answer refers to the date format as shown in the question, and not to any European format or anything else. To parse other formats, the code must be changed accordingly. If the date falls outside the valid reason, datetime.strptime will raise an exception.
lunaryorn
+4  A: 

Using python-dateutil:

In [1]: import dateutil.parser as dparser

In [18]: dparser.parse("monkey 2010-07-10 love banana",fuzzy=True)
Out[18]: datetime.datetime(2010, 7, 10, 0, 0)

Invalid dates raise a ValueError:

In [19]: dparser.parse("monkey 2010-07-32 love banana",fuzzy=True)
# ValueError: day is out of range for month

It can recognize dates in many formats:

In [20]: dparser.parse("monkey 20/01/1980 love banana",fuzzy=True)
Out[20]: datetime.datetime(1980, 1, 20, 0, 0)

Note that it makes a guess if the date is ambiguous:

In [23]: dparser.parse("monkey 10/01/1980 love banana",fuzzy=True)
Out[23]: datetime.datetime(1980, 10, 1, 0, 0)

But the way it parses ambiguous dates is customizable:

In [21]: dparser.parse("monkey 10/01/1980 love banana",fuzzy=True, dayfirst=True)
Out[21]: datetime.datetime(1980, 1, 10, 0, 0)
unutbu
Interesting, I didn't know, that dateutil.parser is that powerful.
lunaryorn
Damn !!!!! What happens when there is more than one date in the string?
Hamish Grubijan
@Hamish: If there are two dates (as in the case of `"monkey 10/01/1980 love 7/10/2010 banana"`), it may raise a ValueError, or (as in the case of `"monkey 10/01/1980 love 2010-07-10 banana"`) it may misinterpret the second date as denoting hours, minutes, seconds or timezone. `fuzzy=True` gives it license to guess.
unutbu
License to guess, lol!
Hamish Grubijan