views:

122

answers:

5

Because the open source geo-coders cannot begin to compare to Google's or even Yahoo's, I would like to start a project to create a good open source geo-coder. Just to clarify, a geo-coder takes some text (usually with some constraints) and returns one or more lat/lon pairs.

I realize that this is a difficult and garguntuan task, so I am wondering how you might get started. What would you read? What algorithms would you familiarize yourself with? What code would you review?

And also, assuming you were going to develop this very agilely, what would you want the first prototype to be able to do?

EDIT: Let's set aside the data question for now. I am going to use OpenStreetMap data, along with a database of waypoints that I have. I would later plan to include other data sets as well, and I realize the geo-coder would be inherently limited by the quality of the original data.

+1  A: 

Algorithms are easy. Good mapping data, however, is expensive. Very expensive.

Google drove their cars all over the world, collecting this data among other things.

Will
+3  A: 

The first (and probably blocking) problem would be: where do you get your data from? (unless you are willing to pay thousands of dollars for proprietary sets).

You could build a geocoding-api on top of OpenStreetMap (they publish their data in dumps on a regular basis) I guess, but that one was still very incomplete last time I checked.

ChristopheD
+1, just what I was going to write. Getting the data is the hardest part.
musicfreak
I was definitely going to use the OSM data. I also have a database of 10,000,000 waypoints collected from geonames.org, the USGS, and the internet at large.
Andrew Johnson
A: 
  • get my free raw data from somewhere like http://ipinfodb.com/ip%5Fdatabase.php
  • load it into a database, denormalizing for fast lookups
  • design my API
  • build it out as a RESTful web service
  • return results in varying formats: JSON, XML, CSV, raw text

The first prototype should accept a ZIP code and return lat/lon in raw text.

RedFilter
I think your first prototype is an excellent idea.
Andrew Johnson
A: 

From a .NET point of view these articles might be interesting for you:

Writing Your Own GPS Applications: Part I
Writing Your Own GPS Applications: Part 2
Writing GIS and Mapping Software for .NET

I've only glanced at the articles but they've been on CodeProject's 'Most Popular' list for a long time.

And maybe this CodePlex project which the author of the articles above made available.

Jay Riggs
A: 

I would start at the absolute beginning by figuring out how you're going to get the data that matches a street address with a geocode. Either Google had people going around with GPS units, OR they got the information from some existing source. That existing source may have been... (all guesses)

  • The Postal Service
  • Some existing maps(printed)
  • A bunch of enthusiastic users that were early adopters of GPS technology who ere more than willing to enter in street addresses and GPS coordinates
  • Some government entity (or entities)
  • Their own satellites
  • etc

I guess what I'm getting at is the information was either imported from somewhere or was input by someone via some interface. As my starting point I would look at how to get that information. In an open source situation, you may be able to get a bunch of enthusiastic people to enter information.

So for my first prototype, boring as it would be, I would create a form for entering information.

Then you need to know the math for figuring out the closest distance (as the crow flies). From there, try to figure out how to include roads. (My guess is you would have to have data point for each and every curve, where you hold the geocode location of the curve, and the angle of the road on a north/south and east/west vector. You'd probably need to take incline into account, too to get accurate road measurements.)

That's just where I'd start.

But in all honesty, I wouldn't even start on this. Other programmers have done it already, I'm more interested in what hasn't already been done.

David Stratton