views:

611

answers:

7

My application will need to reference addresses. Street info will be stored with my main objects but the rest needs to be stored seperately to reduce redundancy. How should I store/retrieve ZIPs, cities and states? Here are some of my ideas.

single table solution (cant do relationships)

[locations] locationID locationParent (FK for locationID - 0 for state entries) locationName (city, state) locationZIP


two tables (with relationships, FK constraints, ref integrity)

[state] stateID stateName

[city] cityID stateID (FK for state.stateID) cityName zipCode


three tables

[state] stateID stateName

[city] cityID stateID (FK for state.stateID) cityName

[zip] zipID cityID (FK for city.cityID) zipName


Then I read into ZIP codes amd how they are assigned. They aren't specifically related to cities. Some cities have more than one ZIP (ok will still work) but some ZIPs are in more than one city (oh snap) and some other ZIPs (very few) are in more than one state! Also some ZIPs are not even in the same state as the address they belong to at all. Seems ZIPs are made for carrier route identification and some remote places are best served by post offices in neighboring cities or states.

Does anybody know of a good (not perfect) solution that takes this into consideration to minimize discrepencies as the database grows?

+1  A: 

I don't know if you're internationalizing your app, but the general construct is this, with a one-to-many relationship with the following item:

Country
Region (state/province)
City

That's usually sufficient to be able to filter your data in a meaningful way. Trust me on this: you don't want to get into the technicalities of geographic land division.

For an address, store the data above plus street address, postal code (international version of zip code), etc. down to the resolution you need. I say resolution because you could split the address field into things like apartment number, street number, street name, street direction, etc. -- but that data may be dependent on the location, so I would avoid doing that if you are going to internationalize your app. Just a street address field is sufficient 99.99% of the time.

Jon Seigel
ryan a
+4  A: 

There is actually some database(with a single table) that the USPS puts out every year with ZIP codes and state and counties and state/county codes. I would look into it. I have an (outdated) copy of it. The schema is pretty simple:


ZIPCODE nvarchar(5) not null
CITY nvarchar(50) null
STATE nvarchar(2) null
STATECODE nvarchar(50) null
COUNTY nvarchar(50) null
COUNTYCODE nvarchar(50) null
(see below)

edit: Also, I would allow your users to add a new zipcode(with city and county and such) because zipcodes are being added all the time..

http://www.usps.com/ncsc/addressinfo/addressinfomenu.htm

edit: Actually, I guess I'm wrong. I don't have an official copy of their database.. I downloaded one of their sample files and their schema seems quite complex.

Earlz
thats a great resource. however right now i'm just concerned with getting a design going and not populating yet.
ryan a
Well, I mean the UPS gives it to you populated. This way the user can just type their zipcode and you'll automatically know the city and state and such.
Earlz
A: 

It depends on whether data integrity, normalization is more important, or performance.

However, for most applications, what you really want is a household. So, this information should be stored separately from your customer, so you can represent multiple customers living at the same household.

The household must have the street address, apt, city, state/province, country, zip/postal code.

I would not bother to normalize this by including only a reference to the city (which would be a key since there could be multiple cities with the same name), but you should store a separate table with cities, countries, postal codes just for validation and integrity purposes.

I would put these foreign keys as fields in the household.

I've worked with marketing databases, and developed an AI system for constructing customer keys and household keys for householding purposes for a bank, and this was one of the primary issues. For analysis purposes, we needed to aggregate accounts to the customer level, and customers to the household level. So your representation should support this for future analytical purposes.

Larry Watanabe
A: 

There's not a single right model for this need - there are dozens. To know which is best for you depends on some additional info, like:

  • performance & capacity - what's driving concern about redundancy?
  • functionality - what kind of data analysis will be performed?
  • historical data - do you have to maintain old data? note that zip codes change, and this invalidates some of the offered solutions
  • internationalization
  • language
  • do you have other kinds of locations? You may want a more abstract solution that can consolidate physical with electronic locations - if for example your user wants to pick preferred contact method, etc.
  • do you want to allow locations to be shared?
  • any other physical location info also kept or highly likely to be added? Like county, country, lat & long, etc?
KenFar
I thought my intention was pretty straightforward. I need to store location data for retrieval. Cities, states and ZIPs. My inquiry was in regards to the structural design of my tables. I said US only. I did not have any analysis or language requirements.Just Cities, states and ZIPs. One, two or three tables and why.
ryan a
Well, no. You didn't say US only, didn't say what kinds of queries you'll be doing (need a drop down list of cities for a given state?), nor what kind of performance requirements to try to meet, nor what languages to support, etc. There's a reason why there are many ways to model this - and it's because of these kinds of requirements.
KenFar
Sorry the US only was a comment to Jon Seigel. I didn't think normalization and good practice depended on exact queries - rather designing correctly for reliable storage and retrieval independant of how I want to display the data which of course should be handled by how I perform my app logic.The relation of Cities, States and Zips I thought would correspond to what they are in reality. What belongs in what and how my tables should reflect that.Sorry for the confusion.
ryan a
No problem. The kind of queries don't affect the logical model - but they can affect the physical model, depending on other performance factors. For example, if you have a primarily transactional database with a small surgical queries dipping in via the index for a single row - your physical model can look a lot like the logical. On the other hand, if you're doing a lot of reporting then a logical or transactional model will kill your performance and you'll want to denormalize.
KenFar
A: 

Thanks for all the replies. I wanted to give a review & my solution incase someone was interested. The question was "How should I store/retrieve ZIPs, cities and states?"

Jon Seigel gave me a fairly reassuring answer about using: Country Region (state/province) City with one to many relationships.

My reasons were redundancy and misspelling. Allowing any free-input of cities and states columns stored within the address records opens up a slew of issues with queries. Not having relational integrity could allow incorrect cities to states. I just wanted to store locations in a uniform way for users to be able to lookup.

For anybody interested my solution is this:

[state]; stateID; stateName

[location]; locationID; stateID (FK); cityName; zipID

The [location.stateID] is a foreign key relation with a one-to-many to [state.stateID]. I decided to keep the ZIP with the location table as unique ZIPs are not directly relational to a unique city. Also it seems ZIPs are not a basis for city/state boundary determination, rather they are for USPS purposes and actually indicate a carrier route and postal delivery zone which can span cities or even states. Another location record can be added with the same city name and the additional ZIP. This way ZIP searches can result in all cities & city searches can result in all zips if need be.

ryan a
A: 

Hi everyone. I am looking for a solution to my database Zip Code problem and thought someone might be able to help me out. I have an event system. Each event has a location (PK = LocationID). Each location has a Zip Code. Now, we know that each Zip Code could have multiple cities. Here is where the issue comes in. Right now, I have a ZipCodes table that has a city, state, zip in there. This does not work (multiple zips for a city issue again), so I added a ZipCodeID field to the ZipCodes table. In theory that solves the issue and I can just store the ZipCodeID in the Locations table (instead of ZipCode) and key off that, but that would be MY field that I created. The next time I reload the data, the keys are all going to be off. Is the ONLY way to do this by keying off of a double primary key of CityName and ZipCode? Seems a bit hokey to me and I must be missing something. Any help would be GREAT! Thanks in advance. Dennis

dennisg
A: 

Attempting to keep city/state/ZIP segregated from the address records is penny wise but pound foolish, primarily because city/state/ZIP can change independently of the address. Disk space is so cheap that the savings of a few bytes by using a lookup table will never match the costs of implementation and maintenance complexity, especially once two addresses that used to share the same city/state/ZIP no longer do.

It's best to save city/state/ZIP in the same record as the address. Puerto Rico addresses (yes, served by the USPS) should also have an "urbanization" field. ZIP is preferably ZIP+4. Search USPS.COM for "Publication 28" to find out about standard U.S. address formats.

If you're planning on doing any validating/standardization of the address using USPS CASS tools (see http://semaphorecorp.com for cheap examples) then your record should also include room for the "scrubbed" address/city/state/ZIP+4/urb as corrected to USPS standards. (Don't overwrite the old original address data with the new scrubbed versions; keep both around to allow investigating any anomalies or conversion errors.)

For example, 678 Alexander Ct, Rivervale NJ 07675 and 610 Abbe Ct, Westwood NJ 07675 share the same ZIP but different city names. Trying to segregate their city/state/ZIP into a separate table just creates headaches. Avoid it.

joe snyder