views:

33

answers:

2

I'm reading csv data uploaded by users in my Ruby on Rails app. When a user specifies that a particular column has dates(or times), I want to be able to automatically detect the format. This means it can be in American or British formats (any of dd/mm/yy, mm/dd/yy, yyyy-mm-dd, 12 Feb 2010, etc etc)

I have tried parsedate in Ruby but it doesn't work for both American and British dates, unless you specify the format. Is there any way to really do this properly, or am I asking for too much? I don't mind calling a script in another language just for this one task. I'm wondering how it's handled in programs like Excel and Google docs.

+1  A: 

Unless the application has a locality I don't know how you can determine this accurately.

What you do know however is that:

  1. There are only 12 months.
  2. Only years can be 4 digits long.
  3. If it contains text then it must be the month.

You could write your own parser with these rules to work it out. It could however (without application locality) misinterpret 05/10/2010 as UK 5th Oct 2010 or US 10th May 2010.

tgandrews
The problem is that I'm accepting data from all over the world, which means in multiple locales. Also, Chris points out the biggest problem I have i.e. when date, month and year are all expressed as 2 digits.
Hrishi Mittal
+1  A: 

there is little that a program can do to magically determine which type of short date format it is.

If you give a program a date like 09/06/08, it could mean either:

  • 9th of June, 2008, or
  • 6th of September, 2008, or perhaps even
  • 8th of June, 2009.

When Ruby parses dates from string, it will use the default format providers to determine what format the date is in. See the Ruby DateTime class documentation for more info.

I think the best thing to do in your situation would be to try and arrange all of your records in to groups, where each group has one particular format of date. If you yourself can't manually determine the difference between the American and British dates by some criterion, unfortunately a program won't be able to either.

However... if each user is from a specific locale, and you can make the (rather large) assumption that every date they upload in a CSV conforms to their country's date format standards, you could make use of the internationalization API. It should be technically possible to grab that particular user's locale, and then load up the correct i18n data (with the appropriate date formatter), and parse the file using the formatter i18n provides you. Read the Rails Internationalization API guide to get an idea of how you can utilize the i18n API.

Chris
Thanks Chris. I looked through the date/format.rb and it seems to do some heavy regex matching to try out various format options. Even so, DateTime.parse("16/01/2010") gives me an ArgumentError.I thought of the group comparison too; it seems to be the closest I will get to reading my users' minds!
Hrishi Mittal
Just updated my answer. The first time I read the question I missed the fact that you're using Rails. Have a look in to the i18n API... If you can make the right assumptions, it might be able to do what you need.
Chris
Thanks Chris, i18n looks useful.
Hrishi Mittal