views:

813

answers:

4

Hi!

I'm trying to sort a CSV file and I want the items sorted by date in reverse order: newest first.

def SortCsvByField( filename, fieldNo, sep = ',' ):
   records = [line.split(sep) for line in file(filename)]

As far as that it's pretty easy, but how do I compare dates?

+2  A: 

What you show is pretty easy but also pretty fragile.

It is best to use Python's CSV library: http://docs.python.org/library/csv.html

About comparing dates (I'm assuming some dates are in a specific column on each row) you can use the datetime module: http://docs.python.org/library/datetime.html. You can use the standard comparison operators on date objects.

Vinko Vrsalovic
+2  A: 

If your dates are in ISO-8601 format (YYYY-MM-DD) then you can sort them as strings, otherwise you will have to parse them first (datetime.strptime).

Then you can sort using for example sorted(records, key=lambda a:a[1]), if the date is the second field.

Mark Byers
+2  A: 

I'd recommend installing the excellent dateutil module. (In Ubuntu/Debian, it is provided by the python-dateutil package).

dateutil can parse date strings into datetime objects: It can handle many different date formats without you having to lift a finger(*):

import dateutil.parser as dparser
date=dparser.parse("Mon May 7 1883 10:36:28")
print(date)
# 1883-05-07 10:36:28

date=dparser.parse("1685-3-21")
print(date)
# 1685-03-21 00:00:00

date=dparser.parse("12/17/1770")
print(date)
# 1770-12-17 00:00:00

Note that parse is interpretting "12/17/1770" as being of the form "MM/DD/YYYY". You can change this behavior using parse's dayfirst and yearfirst options. (See http://labix.org/python-dateutil)

print(type(date))
# <type 'datetime.datetime'>

datetime objects can be sorted easily:

dates=[dparser.parse("Mon May 7 1883 10:36:28"),dparser.parse("1685-3-21"),dparser.parse("12/17/1770"),]
dates.sort()
print(dates)
# [datetime.date(1685, 3, 21), datetime.date(1770, 12, 17), datetime.date(1833, 5, 7)]

If you prefer to not install the dateutil package, then you'll have to roll your own method of converting date strings into datetime objects. This requires more work since you'll have to define the format. Below, '%Y-%m-%d' defines the YYYY-MM-DD format. See http://au2.php.net/strftime (or the man page of strftime) for more information on available format codes.

For example,

dates=[datetime.datetime.strptime(date_str,'%Y-%m-%d') for date_str in
       ('1883-5-7','1685-3-21','1770-12-17',)]
print([str(date) for date in dates])
# ['1883-05-07 00:00:00', '1685-03-21 00:00:00', '1770-12-17 00:00:00']
dates.sort()
print([str(date) for date in dates])
# ['1685-03-21 00:00:00', '1770-12-17 00:00:00', '1883-05-07 00:00:00']

To control the format when converting datetime objects back into printable strings, you can use the datetime.datetime.strftime() method.

unutbu
(1) "If you are using linux"?? Is there something platform dependant about dateutil? It appears to work fine on Windows [but see point 3 below] (2) dates preceded by a `#` e.g. `# 2009-09-25 10:36:28`?? Output from print()? If so, they're wrong. Should be e.g. `1883-05-07 10:36:28` (3) It will interpret `"01/02/2009"` as the first day of February for those adhering to the `DD/MM/YYYY` convention without them lifting a finger, will it? It wouldn't do something nasty like silently taking `"31/12/2008"` as 31 December while also taking `"01/02/2009"` as the 2nd of January, would it?
John Machin
That's a good start, but you haven't addressed the issue of it cheerfully accepting mixed dd/mm and mm/dd dates by default with (AFAICT) no option to select "strict" behaviour.
John Machin
I don't understand why it is appending 10-19 by default to 'month,date'-less dates. For instance 1946 --> 1946-10-19 and 46 --> 2046-10-19.. Can we change the default behavior?
ThinkCode
@ThinkCode: It's appending 10-19 because today is Oct 19. What would you like `dparser.parse('1946')` to return?
unutbu
oops, how did I even miss that! well, I applied this to date of birth column and those that had just 1946 was showing as 1946-10-19. Thanks for clearing it up. Us specifying what default date/month should be applied will make it more awesome. It is just if else blocks I guess. Thanks misspelled Ubuntu!
ThinkCode
Glad you got that sorted, misspelled TechnoKid!
unutbu
+2  A: 

Assuming that you know the format of the dates, and that they're in column 1 of your CSV file:

>>> import csv
>>> from datetime import datetime
>>> def date_key(row):
        return datetime.strptime(row[1].strip(), "%m/%d/%Y")

>>> with open('c:\\temp\\test\\date_test.csv', 'rb') as f:
        data = list(csv.reader(f))

>>> data
[['foo', ' 3/11/2004'], ['bar', ' 2/15/2001'], ['baz', '11/15/2007'], ['bat', '10/13/2002']]
>>> data.sort(key=date_key)
>>> data
[['bar', ' 2/15/2001'], ['bat', '10/13/2002'], ['foo', ' 3/11/2004'], ['baz', '11/15/2007']]
Robert Rossney