views:

17

answers:

1

I'm trying to run some statistics over the Stack Overflow data dump, and for that I would like to know the time zone for each user. However, all I have to go on is the completely free-form "location" string.

I'll stress that I'm only looking for an approximation of the time zone; of course, in general this is an unsolvable problem. However, many people fill out their country, state and/or city, which should give a pretty good indication. It's okay if it fails for other cases. It doesn't have to be reliable, it doesn't have to be accurate, it doesn't have to cover all bases.

I don't want to waste too much time on this, so I'm wondering if there is some code out there that can make a reasonable guess. Any language, platform, API or library goes. Any ideas?

A: 

Check this discussion for information on how to get the lat/lon from an arbitrary location string.

Once you have the lat/lon, you can use the web services at GeoNames to retrieve the time zone.

ElectricDialect
Thanks! I hadn't found that yet. Looks like GeoNames.org (http://www.geonames.org/) can do fuzzy string to latitude/longtitude, and latitude/longtitude to time zone. That should be all I need. At 50,000 requests per day, it'd take a week to look up all users, but I can think of many ways to reduce the number of queries.
Thomas
No problem, hope that helps.
ElectricDialect