What is the most flexible design for a table of physical addresses in some variety of SQL? I assume there is something better than { street address, building address, city, state/province/region, country, zipcode }. Also are there standard names for various components of addresses, especially international standard names? Further, what sort of provision should be made for the same physical location having multiple addresses? In what circumstances could this occur?
views:
60answers:
5One possibility is, a text field could be used so that the address could be formatted in any way needed. However, then the text would need to be parsed for the parts of the address useful for providing links to maps and directions. Perhaps there should be a text full_address field and then a separate set of columns that will contain parsed (automatically or manually) address pieces useful for links to maps and directions. If the address could not be parsed, a flag would be set saying it couldn't be recognized. (Perhaps this means the address is in a country that the parser doesn't know the address format for, or that the address was entered improperly.)
The most generic ADDRESS table I constructed included the following columns:
- ADDRESS_1 - c/o line
- ADDRESS_2 - RR/PO Box
- ADDRESS_3 - suite/apt
- ADDRESS_4 - street address
- ADDRESS_TYPE_CODE - foreign key to ADDRESS_TYPE_CODES table
- CITY
- STATE_PROVINCE
- POSTAL_CODE
- COUNTRY
The ADDRESS_TYPE_CODES would be business, home, mailing/shipping, etc.
There's no way to know what address details you're going to get, so you have to make it flexible.
According to the United States Postal Service:
The components of the delivery address [line] are the primary address number, street name, secondary address identifier, and secondary address range.
Here's more than most people ever wanted to know about the delivery address line for United States addresses.
You would have to look up similar documents for any other countries that you're interested in.
Are you sure you need to be very flexilbe? I usually try and get the simplest design (ie least amount of columns) for the dataset I'm currently working with, then be agile.
http://www.upu.int has the format standards for international addresses. Publication 28 at http://usps.com has the U.S. format standards. CASS software like http://semaphorecorp.com validates and standardizes U.S. addresses.
The USPS wants the following unpunctuated address components concatenated on a single line:
- house number
- predirectional (N, SE, etc)
- street
- suffix (AVE, BLVD, etc)
- postdirectional (SW, E, etc)
- unit (APT, STE, etc)
- apartment/suite number
Eg, 102 N MAIN ST SE APT B.
If you keep the entire address line as a single field in your database, input and editing is easy, but searches can be more difficult (eg, in the case SOUTH EAST LANE is the street EAST as in S EAST LN or is it LANE as in SE LANE ST?).
If you keep the address parsed into separate fields, searches for components like street name or apartments become easier, but you have to append everything together for output, you need CASS software to parse correctly, and PO boxes, rural route addresses, and APO/FPO addresses have special parsings.
A physical location with multiple addresses at that location is either a multiunit building, in which case letters/numbers after units like APT and STE designate the address, or it's a Commercial Mail Receiving Agency (eg, UPS store) and a maildrop/private mailbox number is appended (like 100 MAIN ST STE B PMB 102), or it's a business with one USPS delivery point and mail is routed after USPS delivery (which usually requires a separate mailstop field which the company might need but the USPS won't want on the address line).
A contact with more than one physical address is usually a business or person with a street address and a PO box. Note that it's common for each address to have a different ZIP code.
It's quite typical that one business transaction might have a shipping address and a billing address (again, with different ZIP codes). The information I keep for EACH address is:
- name prefix (DR, MS, etc)
- first name and initial
- last name
- name suffix (III, PHD, etc)
- mail stop
- company name
- address (one line only per Pub 28 for USA)
- city
- state/province
- ZIP/postal code
- country
I typically print mail stops somewhere between the person's name and company because the country contains the state/ZIP which contains the city which contains the address which contains the company which contains the mail stop which contains the person. I use CASS software to validate and standardize addresses when entered or edited.