views:

39

answers:

2

Use case example: Client A comes to request sales information, enters their zip code and are directed to Representative X.

Since there is an effectively infinite number of zip codes there will not be an agent assigned to every single zip code which would then move out to the county level, and then so on into a region of counties and finally end at the state level.

What type of relationships would be best at modeling this scenario?

Is there any way to remove the need to specifically define regions that a zip code could be approximated to a nearest county assigned a rep, if so would it only be possible with a full geolocation look up to do distance comparisons to find the nearest representative or perhaps could the zip code itself be used to closely approximate the distance?

Clarification: The real intent of this question is how to solve the region approximation without needing to maintain a perfect association table of counties to regions or is the solution to this problem so complex that the cost of maintaining the relationships manually is less prohibitive?

+1  A: 

You can get cheap or free (Census.gov) Zip code information containing at least the State in which the Zip code lies. You might even be able to get Zip to County information if you require more granularity. At this point you could simply grab the representative that matches the state or county in which the zip code lies.

Using the TIGER data available from the Census Bureau, you could populate a spatial database with the zip code boundaries and your representative's known areas. A simple query could then retrieve the representative which is closest to the given zip code. I've had a favorable experience with PostGIS and I am aware of both Oracle and SqlServer spatial extensions.

sixlettervariables
Although, it appears the Census Bureau only has the latest zip code information in the TIGER database, which means you would have to extract it from the shapefiles. Not exactly as straightforward as I thought, but still free.
sixlettervariables
Yes a direct match is simple, it's the region approximation portion that is the heart of this question.
Chris Marisic
Then actually it IS better to grab the TIGER data. I've had great success with using spatial databases with the TIGER data, you could allow a program such as PostGIS to handle the query as to which representative is the closest. Moreover, it is trivial to import the shapefile information into PostGIS for querying.
sixlettervariables
+1  A: 

Well you can easily get the Lat/Long from the Google or Yahoo API (in my last project we had an issue with Google not finding a certain zipcode in NJ that Yahoo found, fyi).

So if you cache the Lat/Long for each zipcode with a representative (and I would probably recommend caching the Lat/Long for each zipcode entered by a customer too, so you don't have to hit a web service each time), then you can just get the Lat/Long of the user's zipcode and run through a distance calculation for each rep. This does have an order of n-time complexity (O(n) for n representatives)

statichippo
This does seem like a very fitting usage as I could easily setup in my service layer when it does a zip look up if it doesn't have the Lat/Long for it to poll it and save it to the database and do an initial load just for the collection of all of my representatives.
Chris Marisic
You might also think about caching the result for all or certain (maybe high-traffic) zipcodes. So if someone comes and types in a zip where you don't have a rep, you might have a result all ready for them. Don't cache for too long though, or when you add a closer rep they won't be found ;)
statichippo
I don't know where your application will run and I also live in Belgium which is a tiny country compared to most others, but isn't the number of zipcodes relatively limited? Like say less then 5k? You could run an initial population thread that just runs through all your zips in less then a day right?
borisCallens
The number of zip codes is around 47,000. My idea would be to do initial look ups on the Lat/Long of all my zips with reps in them and store that in the db then do a lookup for each access if it doesn't have the Lat/Long and store that and then just parse the list each time since there will only ever be a handful of results per state for a long time atleast.
Chris Marisic