views:

436

answers:

8

I am designing a contact management system and have come across an interesting issue regarding modeling geographic locations in a consistent way. I would like to be able to record locations associated with a particular person (mailing address(es) for work, school, home, etc.) My thought is to create a table of locales such as the following:

Locales (ID, LocationName, ParentID) where autonomous locations (such as countries, e.g. USA) are parents of themselves. This way I can have an arbitrarily deep nesting of 'political units' (COUNTRY > STATE > CITY or COUNTRY > STATE > CITY > UNIVERSITY). Some queries will necessarily involve recursion.

I would appreciate any other recommendations or perhaps advice regarding predictable issues that I am likely to encounter with such a scheme.

+3  A: 

Sounds like a good approach to me. The one thing that I'm not clear on when reading you post is what "parents of themselves" means - if this is to indicate that the locale does not have a parent, you're better off using null than the ID of itself.

Kyle Cronin
You should definitely use a null value for the parent (root) level items rather than making them parents of themselves. Try to avoid putting in data unless it actually has meaning.
Dr8k
+4  A: 

You might want to have a look at Freebase.com as a site that's had some open discussion about what a "location" means and what it means when a location is included in another. These sorts of questions can generate a lot of discussion.

For example, there is the obvious "geographic nesting", but there are less obvious logical nestings. For example, in a strictly geographic sense, Vatican City is nested within Italy. But it's not nested politically. Similarly, if your user is located in a research center that belongs to a university, but isn't located on the University's property, do you model that relationship or not?

Paul Tomblin
+1  A: 

I think you might be overthinking this. There's a reason most systems just store addresses and maybe a table of countries. Here are some things to look out for:

  1. Would an address in the Bronx include the borough as a level in the hierarchy? Would an address in an unincorporated area eliminate the "city" level of the hierarchy? How do you model an address within a university vs an address that's not within one? You'll end up with a ragged hierarchy which will force you to traverse the tree every time you need to display an address in your application. If you have an "address book" page the performance hit could be significant.

  2. I'm not sure that you even have just one hierarchy. Brown University has facilities in Providence, RI and Bristol, RI. The only clean solution would be to have a double hierarchy with two campuses that each belong to their respective cities in one hierarchy but that both belong to Brown University on the other hierarchy. (A university is fundamentally unlike a political region. You shouldn't really mix them.)

  3. What about zip codes? Some zip codes encompass multiple towns, other times a city is broken into multiple zip codes. And (rarely) some zip codes even cross state lines. (According to Wikipedia, at least...)

  4. How will you enter the data? Building out the database by parsing conventionally-formatted addresses can be difficult when you take into account vanity addresses, alternate names for certain streets, different international formats, etc. And I think that entering every address hierarchically would be a PITA.

  5. It sounds like you're trying to model the entire world in your application. Do you really want or need to maintain a table that could conceivable contain every city, state, province, postal code, and country in the world? (Or at least every one where you know somebody?) The only thing I can think of that this scheme would buy you is proximity, but if that's what you want I'd just store state and country separately (and maybe the zip code) and add latitude and longitude data from Google.

Sorry for the extreme pessimism, but I've gone down that road myself. It's logically beautiful and elegant, but it doesn't work so well in practice.

JPLemme
+2  A: 

I would think carefully about this since it may not be a necessary feature. Why not just use a text field and let users type in an address?

Remember the KISS principle (Keep It Simple, Stupid).

andyuk
+2  A: 

Here's a suggestion for a pretty flexible schema. An immediate warning: it could be too flexible/complex for what you actually need

Location (LocationID, LocationName) -- Basic building block

LocationGroup (LocationGroupID, LocationGroupName, ParentLocationGroupID) -- This can effective encapsulate multiple hierarchies. You have one root node and then you can create multiple independent branches. E.g. you can split by state first and then create several sub-hierarchies e.g. ZIP/city/xxxx

LocationGroupLocation (LocationID, LocationGroupID) -- Here's how you link Location with one or more hierarchies. E.g. you can link your house to a ZIP, as well as a City... What you need to implement is a constraint that you should not be able to link up a location with any two hierarchies where one of them is a parent of the other (as the relationship is already implicit).

Andrew from NZSG
+1  A: 

I agree with the other posts that you need to be very careful here about your requirements. Location can become a tricky issue and this is why GIS systems are so complicted.

If you are sure you just need a basic heirarchy structure, I have the following suggestions:

  • I support the previous comment that root level items should not have themselves as the parent. Root level items should have a null value for the parent. Always be careful about putting data into a field that has no meaning (i.e. "special" value to represent no data). This practice is rarely necessarily and way overused in the devleoper community.
  • Consider XPath / XML. This is Something to consider for bother recording the heirarchy structure, and for processing / parsing the data at retrieval. If you are using MSSQL Server, the XPath expressions in select statements are perfect for tasks such as returning the full location/heirarchy path of a record as the code is simple and the results are fast.
Dr8k
+1  A: 

For Geographic locations you may wish to resolve an address to a Latitude, Longitude array (perhaps using Google maps etc.) to calculate proximities etc.. For Geopolitical nesting ... I'd go with the KISS response.

If you really want to model it, perhaps you need the types to be more generic ... Country -> State -> County -> Borough -> Locality -> City -> Suburb -> Street or PO Box -> Number -> -> Appartment etc. -> Institution (University or Employer) -> Division -> Subdivision-1 -> subdivision-n ... Are you sure you can't do KISS?

CAD bloke
A: 

I'm modeling an apps for global users and I have the same problems, but I think that this approach could already be in use in many enterprise. But why this problem don't have an universal solution? Or, has this problem one best solution that can be the start point or anybody in the world need think in a solution for it since beginnig? In IT, we are making the same things any times and in many places, unfortunately. For exemplo, who are not have made more than one user, customer or product's database? And the worst, all enterprise in the world has made it. I think that could have universal solutions for universal problems.

Marcelo