views:

1165

answers:

7

I would like to write a JavaScript function that validates a zip code, by checking if the zip code actually exists. Here is a list of all zip codes:

http://www.census.gov/tiger/tms/gazetteer/zips.txt (I only care about the 2nd column)


This is really a compression problem. I would like to do this for fun. OK, now that's out of the way, here is a list of optimizations over a straight hashtable that I can think of, feel free to add anything I have not thought of:

  • Break zipcode into 2 parts, first 2 digits and last 3 digits.
  • Make a giant if-else statement first checking the first 2 digits, then checking ranges within the last 3 digits.
  • Or, covert the zips into hex, and see if I can do the same thing using smaller groups.
  • Find out if within the range of all valid zip codes there are more valid zip codes vs invalid zip codes. Write the above code targeting the smaller group.
  • Break up the hash into separate files, and load them via Ajax as user types in the zipcode. So perhaps break into 2 parts, first for first 2 digits, second for last 3.

Lastly, I plan to generate the JavaScript files using another program, not by hand.

Edit: performance matters here. I do want to use this, if it doesn't suck. Performance of the JavaScript code execution + download time.

Edit 2: JavaScript only solutions please. I don't have access to the application server, plus, that would make this into a whole other problem =)

A: 

Assuming you've got the zips in a sorted array (seems fair if you're controlling the generation of the datastructure), see if a simple binary search is fast enough.

Hank Gay
its fast enough, but the JS code will be HUGE, which will be bad. i didnt mention this, but its a requirement, yeah =)
mkoryak
If size is a concern, turn on gzipping and cacheing; it's a transparent way to shrink the download time without obfuscating your logic.
Hank Gay
+4  A: 

You could do the unthinkable and treat the code as a number (remember that it's not actually a number). Convert your list into a series of ranges, for example:

zips = [10000, 10001, 10002, 10003, 23001, 23002, 23003, 36001]
// becomes
zips = [[10000,10003], [23001,23003], [36001,36001]]
// make sure to keep this sorted

then to test:

myzip = 23002;
for (i = 0, l = zips.length; i < l; ++i) {
    if (myzip >= zips[i][0] && myzip <= zips[i][1]) {
        return true;
    }
}
return false;

this is just using a very naive linear search (O(n)). If you kept the list sorted and used binary searching, you could achieve O(log n).

nickf
+1 for giving big O specs.
Andrew
when you say [36001-36001] you dont actually mean subtraction right. threw me for a loop for a second
mkoryak
I think those are typos. I'm pretty sure he meant to put commas there, not dashes.
Andrew
I think he actually means those to be shorthand for a range (like Prototype's $R(23001, 23003).
eyelidlessness
ah yes, sorry. initially i hadn't meant to make it code... editing now.
nickf
@eyelid: no, it's not actually meant to hold the entire range, just the top and bottom values.
nickf
+1  A: 

I use Google Maps API to check whether a zipcode exists.

It's more accurate.

Luca Matteis
Yes, the census.gov list is badly outdated.
Miles
thanks, ill look into this. might be good i the 'dont reinvent the wheel' sort of way
mkoryak
can you point me to the API which does this. i looked, and i cant see anything for this right away.
mkoryak
A: 

This might be useful:

PHP Zip Code Range and Distance Calculation

As well as List of postal codes.

Scott Evernden
+2  A: 

I would like to write a JavaScript function that validates a zip code

Might be more effort than it's worth, keeping it updated so that at no point someone's real valid ZIP code is rejected. You could also try an external service, or do what everyone else does and just accept any 5-digit number!

here is a list of optimizations over a straight hashtable that I can think of

Sorry to spoil the potential Fun, but you're probably not going to manage much better actual performance than JavaScript's Object gives you when used as a hashtable. Object member access is one of the most common operations in JS and will be super-optimised; building your own data structures is unlikely to beat it even if they are potentially better structures from a computer science point of view. In particular, anything using ‘Array’ is not going to perform as well as you think because Array is actually implemented as an Object (hashtable) itself.

Having said that, a possible space compression tool if you only need to know 'valid or not' would be to use a 100000-bit bitfield, packed into a string. For example for a space of only 100 ZIP codes, where codes 032-043 are ‘valid’:

var zipfield= '\x00\x00\x00\x00\xFF\x0F\x00\x00\x00\x00\x00\x00\x00';
function isvalid(zip) {
    if (!zip.match('[0-9]{3}'))
        return false;
    var z= parseInt(zip, 10);
    return !!( zipfield.charCodeAt(Math.floor(z/8)) & (1<<(z%8)) );
}

Now we just have to work out the most efficient way to get the bitfield to the script. The naive '\x00'-filled version above is pretty inefficient. Conventional approaches to reducing that would be eg. to base64-encode it:

var zipfield= atob('AAAAAP8PAAAAAAAAAA==');

That would get the 100000 flags down to 16.6kB. Unfortunately atob is Mozilla-only, so an additional base64 decoder would be needed for other browsers. (It's not too hard, but it's a bit more startup time to decode.) It might also be possible to use an AJAX request to transfer a direct binary string (encoded in ISO-8859-1 text to responseText). That would get it down to 12.5kB.

But in reality probably anything, even the naive version, would do as long as you served the script using mod_deflate, which would compress away a lot of that redundancy, and also the repetition of '\x00' for all the long ranges of ‘invalid’ codes.

bobince
Yes. Don't violate the first rule of programming: Don't reinvent the wheel!
Loren Pechtel
i like this solution best because its pretty clever, fast, and fun. thanks
mkoryak
A: 

So... You're doing client side validation and want to optimize for file size? you probably cannot beat general compression. Fortunately, most browsers support gzip for you, so you can use that much for free.

How about a simple json coded dict or list with the zip codes in sorted order and do a look up on the dict. it'll compress well, since its a predictable sequence, import easily since it's json, using the browsers in-built parser, and lookup will probably be very fast also, since that's a javascript primitive.

TokenMacGuy
+1  A: 

Your list of ZIP codes is obsolete. Don't bother "validating" ZIP codes of 5 digits anyway, they're obsolete as explained at http://semaphorecorp.com/cgi/zip5.html

Correct validating of addresses and ZIP+4 codes is done with CASS software against USPS databases.

joe snyder