views:

165

answers:

4

The location data is essentially in a tree structure. So when you ask someone for their address you would like to know what Area/Suburb, City/Town, State/Province and Country that they live in.

This data will hopefully be set up only once and need very little modification. Most E-commerce sites and a lot of others would need to store this type of information but I am struggling to find much about it. Maybe because I am not searching for the right term but I thought it would be a common problem that has already been solved.

This leads to another question. Where could I get my hands on this type of information like what areas belong in which city, which cities belong in which states and which states belong in which countries?

EDIT: To make things more complicated I would like to be able to provide a generic sort of solution because for example some countries don't use zip codes, they use postal codes, it is sort of the same thing but not quite. My big desire is to have Area/Suburb -> City/Town -> Region/Province/State -> Country. I might be attempting the impossible.

EDIT2: Sorry I might not have been clear enough but I don't need zip codes. Just the Suburb/Area NAME -> City........

A: 

This is more complicated than it first appears. Some towns have multiple zip codes. Some zip codes have multiple towns. Many addresses can use either the primary town or the name of the nearest city. Validating an address is not trivial.

If you want to try to store zip code data, there are zip code databases you can buy. But you'll probably find it's easier to validate against the USPS using their tool.

Jeremy Stein
A: 

If you're not interested in zip codes, perhaps you could scrape what you need from Wikipedia.

Jeremy Stein
A: 

First, there are two distinct things here: geographical information and political/organizational information. There are can be shared zip codes, but geographically any object is completely located inside another object.

When we had to solve the problem we just make a generic tree structure, where each geo node had reference to each parent. Top-level nodes (countries in our case) had no parent references. What is more, for different countries we had different structures. The only requirement is that it had to be hierarchical.

After that we was able to speedup the geo queries by precomputing the tree traversal information. We traversed the tree, computed traversal order and stored it in the geo node. After that simple int comparison in the query was enough to check if one geo is inside another and so on.

At the same time, storing things like the zip codes information looks like parallel data structure here, to be stored in different table with references to the geo nodes in the geo structure.

Slava Tutushkin
+1  A: 

If you're wanting to obtain US zips information have a look at Tiger. It's information from a US census and will provide you with all the data. However, the caveat is that the information isn't easy to abstract, so it will take some time.

Gavin Miller