views:

949

answers:

3

I am writing a web application, that is US specific, so the format that other countries use for postal codes are not important. I have a list of us zip codes that i am trying to load into a database table that includes the

  • 5 digit us zip code
  • latitude
  • longitude
  • usps classification code
  • state code
  • city

the zip code is the primary key as it is what i will be querying against. i started using a medium int 5 but that truncates the zip codes that have leading zeros.

i considered using a char5 but am concerned about the performance hit of indexing against a char variable.

so my question is what is the best mysql datatype to store zip codes as?

Note: i have seen it in several other questions related to zip codes. I am only interested in US 5 digit zip codes. So there is no need to take other countries postal code formats into consideration.

+6  A: 

char(5) is the correct way to go. String indexing is quite fast, particularly when it is such a small data set.

You are correct in that you should never use an integer for a zip code, since it isn't truly numeric data.

Edit to add: Check out this for good reasons why you don't use numbers for non-numerically important data: http://stackoverflow.com/questions/893454/is-it-a-good-idea-to-use-an-integer-column-for-storing-us-zip-codes-in-a-database

Erich
what non numeric data is in a 5 digit US zip code?
KM
Erich: Why not an integer? I would think that storing as an integer would help with type checking, you can add leading zeroes in the client, a character can be a non-digit ... Just wondering what point I'm missing.
John at CashCommons
The data set could be millions. Most likely there will be other addresses in the system where a zip is stored, and they should all be the same type
KM
Zip Codes are not numeric data. Numeric data is data in which it makes sense to do mathematical operations on, which you would never do with a Zip Code. Otherwise, you are unfairly limiting your dataset.As far as the data set size, if the Zip code is truly the PK, there are only 99,999 values, a very small dataset relatively.
Erich
if you have 10 million customers with zip codes that is a big dataset
KM
KM, 50 megabytes is not a big data set. An index with 10 million keys in it is relatively small in today's data management.
Walter Mitty
+2  A: 

go with your medium INT(5) ZEROFILL, it should add the leading zeros for you. No need to impact the index and performance on a formatting issue.

KM
as a char 5 it took .0007 seconds as medint5 zerofill it took .0006 seconds. i think both are valid solutions, but i think i am going to go with char5 and take the slight performance hit for a little more peace of mind on the data integrity front.
Eric
A: 

If he makes it Char(6), then he can handle Canadian postal codes as well.

When you consider that there is a maximum of 100,000 5-digit Zip Code and how little space it would take up even if you made the entire table memory-resident, there's no reason not to.

David
i have no need for Canadian postal codes though.
Eric
I saw that from the original post. I just figured I'd mention it in case anyone else looked at this question in the future looking for advice but had a situation where foreign postal codes MIGHT make a difference. My main point was that, in the age of gigabyte memory sticks, a zip code table is pretty small. (I dealt with these when memory was measured in KILObytes)
David