views:

53

answers:

4

In my MySQL InnoDB database, I have dirty zip code data that I want to clean up.

The clean zip code data is when I have all 5 digits for a zip code (e.g. "90210").

But for some reason, I noticed in my database that for zipcodes that start with a "0", the 0 has been dropped.

So "Holtsville, New York" with zipcode "00544" is stored in my database as "544"

and

"Dedham, MA" with zipcode "02026" is stored in my database as "2026".

What SQL can I run to front pad "0" to any zipcode that is not 5 digits in length? Meaning, if the zipcode is 3 digits in length, front pad "00". If the zipcode is 4 digits in length, front pad just "0".

UPDATE:

I just changed the zipcode to be datatype VARCHAR(5)

A: 

Store your zipcodes as CHAR(5) instead of a numeric type, or have your application pad it with zeroes when you load it from the DB. A way to do it with php:

echo sprintf("%05d", 205); // prints 00205
echo sprintf("%05d", 1492); // prints 01492

Or you could have MySQL pad it for you:

SELECT LPAD(zip, 5, '0') as zipcode FROM table;

Here's a way to update and pad all rows:

ALTER TABLE `table` CHANGE `zip` `zip` CHAR(5); #changes type
UPDATE table SET `zip`=LPAD(`zip`, 5, '0'); #pads everything
quantumSoup
I would like to actually clean up my data in the database itself. Do you know the equivalent to do this with SQL?
TeddyR
@TeddyR Yes, check the updated answer ^
quantumSoup
I ran the following code that made it work "UPDATE tablename SET zip = LPAD(zip, 5, '0');"
TeddyR
A: 

First of all your data type shouldn't be numeric. Maybe it's not the best solution, but it will work fine.

UPDATE TableName
SET zipcode = 
CASE Length(zipcode) 
WHEN 0 THEN '00000'
WHEN 1 THEN '0000' + zipcode 
WHEN 2 THEN '000' + zipcode 
WHEN 3 THEN '00' + zipcode
WHEN 4 THEN '0' + zipcode
WHEN 5 THEN zipcode
END 
hgulyan
LEN is not available with MySQL. Do you know what the MySQL command is that is equivalent? THis looks promising
TeddyR
I've edited my answer. It's length in mysql, len is for sql server
hgulyan
A: 

Ok, so you've switched the column from Number to VARCHAR(5). Now you need to update the zipcode field to be left-padded. The SQL to do that would be:

UPDATE MyTable
SET ZipCode = LPAD( ZipCode, 5, '0' );

This will pad all values in the ZipCode column to 5 characters, adding '0's on the left.

Of course, now that you've got all of your old data fixed, you need to make sure that your any new data is also zero-padded. There are several schools of thought on the correct way to do that:

  • Handle it in the application's business logic. Advantages: database-independent solution, doesn't involve learning more about the database. Disadvantages: needs to be handled everywhere that writes to the database, in all applications.

  • Handle it with a stored procedure. Advantages: Stored procedures enforce business rules for all clients. Disadvantages: Stored procedures are more complicated than simple INSERT/UPDATE statements, and not as portable across databases. A bare INSERT/UPDATE can still insert non-zero-padded data.

  • Handle it with a trigger. Advantages: Will work for Stored Procedures and bare INSERT/UPDATE statements. Disadvantages: Least portable solution. Slowest solution. Triggers can be hard to get right.

In this case, I would handle it at the application level (if at all), and not the database level. After all, not all countries use a 5-digit Zipcode (not even the US -- our zipcodes are actually Zip+4+2: nnnnn-nnnn-nn) and some allow letters as well as digits. Better NOT to try and force a data format and to accept the occasional data error, than to prevent someone from entering the correct value, even though it's format isn't quite what you expected.

Craig Trader
+1  A: 

You need to decide the length of the zip code (which I believe should be 5 characters long). Then you need to tell MySQL to zero-fill the numbers.

Let's suppose your table is called mytable and the field in question is zipcode, type smallint. You need to issue the following query:

ALTER TABLE mytable CHANGE `zipcode` `zipcode`
    smallint( 5 ) UNSIGNED ZEROFILL NOT NULL;

The advantage of this method is that it leaves your data intact, there's no need to use triggers during data insertion / updates, there's no need to use functions when you SELECT the data and that you can always remove the extra zeros or increase the field length should you change your mind.

Anax