views:

791

answers:

6

Given the task of storing international geographic addresses in a relational table, what is the most flexible schema? Should every part of the address be broken out into their own fields, or should it be more like free text?

Is there any sense in separating differently formatted address into different tables? For example, have a table for USAAddress, CanadianAddress, UKAddress...?

+1  A: 

The only way is to split them to:

Name varchar,
Title varchar,
StreetAddress varchar,
StreetAddressLine2 varchar,
zipCode varchar,
City varchar,
Province varchar,
Country lookup

since almost every country has it's own standard for having address data, and evey country has a different format of zipcodes.
You can have a small sample of problems in my post from a similiar question.

This should not make sense to separate addresses for every country, since there are countries where you have few address conventions. Some popular conventions include not having streets in small villages, only village name and number, while streets are in larger cities’ addresses. I have learned that in Hungary’s capital – Budapest, there are few streets having the same name (you distinct them by city’s district number), while other cities does not have such addresses (someone from Hungary may actually confirm if this is true). So the total number of address formats will be numer_of_countries multiplied by number of address formats in this country… Can be done with different tables, but it will be horrible work to do.

smok1
How come you've used Province but ZipCode? Also, the StreetAddress and StreetAddressLine2 are generic enough for display purposes but if you've gotta do EDI or parse the addresses programmatically for QA (or for any other reason) you're going to be up a gum tree.
BenAlabaster
It simply depends what you need those data for. For sending post to customers worldwide my solution will be ok. For worldwide EDI you will probably need something like your answer to this question. However, for navigation purposes you will need additional data structures containing GIS data and links between them (so you will know that adress1 is located in the same place as adress2, even though that have got different street name, etc).So it is hard to say which solution is ok (not too complicated and accurate enough) without knowing the context.
smok1
+1  A: 

That depends on what you want to do with it.

I've found it always easier to use addresses for other purposes (such as verification against USPS data or getting shipping rates from UPS/FEDEX) if they're separated.

Here's what I typically use for addresses:

  • Address Line 1
  • Address Line 2
  • Address Line 3
  • City
  • Region
  • Postal Code
  • County
  • Country

In Response to the edit: For most situations I don't see the use. The table I listed above has enough fields (and is generic enough) for most country's addresses.

Stephen Wrighton
Address Line 1, 2 and 3 are definitely generic enough but when it comes to parsing addresses programmatically, you're going to be hosed. Parsing addresses programmatically is not a trivial task when considering international address formats.
BenAlabaster
+16  A: 

I will summarize my thoughts from my blog post - A lesson in address storage.

On my current project [I work for a logistics company] we're storing international addresses. I've done research on addresses all over the world in the design of this portion of the database. There's a lot of different formats. In the Western world we tend to use a fairly uniform format - a few differences but they're mostly:

  • Street Number - Numeric
  • House or Building Name - [VarChar - in the UK some houses/buildings are identified by name, not by number]
  • Street Number Suffix [VarChar, although in most cases, Char(1) would suffice]
    • A, B etc
  • Street Name [VarChar]
  • Street Type [VarChar or Int if you have a StreetTypes table]
    • So far, I've found 262 unique types in the English speaking world, there are likely more, and don't forget other languages i.e. Strasse, Rue etc.
  • Street Direction [VarChar(2)]
    • N, E, S, W, NE, SE, NW, SW
  • Address Type [VarChar or Int if you have an AddressTypes table]
    • PO Box
    • Apartment
    • Building
    • Floor
    • Office
    • Suite
    • etc...
  • Address Type Identifier [VarChar]
    • i.e. Box Number, Apartment Number, Floor Number remember apartment numbers and offices sometimes have alphanumeric info - like 1A
  • Local Municipality [VarChar or Int if you have a Municipalities table]
    • For instance, if your hamlet/village appears in the address before the town.
  • City/Town [VarChar or Int if you have a Cities table]
  • Governing District [VarChar or Int if you have a Districts table]
    • State (U.S.)
    • Province (Canada)
    • Federal District (Mexico)
    • County (U.K.)
    • etc...
  • Postal Area [VarChar]
    • Zip (U.S.)
    • Postal Code (Canada, Mexico)
    • Postcode (U.K.)
  • Country [VarChar or Int if you have a Countries table]

This appears to cover most countries but the ordering of the fields may be displayed differently. You can find a list of display formats at http://www.bitboost.com/ref/international-address-formats.html#Formats

For instance, in many countries, the postal code falls before the city name and the street number falls after the street name. In Canada, U.S. and the U.K. the street number precedes the street name and the postal code (or ZIP) comes after the city name.

In answer to your question about separation of the addresses into different countries, I wouldn't suggest it, it will just make life harder in other areas - for instance reporting. The format I've provided covers all the addresses in our logistics database which covers USA, Canada, Mexico and the UK without any problems. It also covers all of our European, Chinese, Japanese and Malaysian addresses. I can't speak for other countries but I haven't yet had to store an address from a country that these fields won't support.

I don't suggest going with the Address1, Address2, Address3 format suggested by others and seen in many databases because parsing address information out of an alphanumeric string isn't as simple as it might first seem - especially if data isn't entered correctly, due to misinformation, typo, misspelling etc. If you separate your fields you can use distance algorithms to check for likely meaning, use probability to check street name against postal code and street number or to check province and city against street name etc. Try doing any of that when you've got a string denoting your whole street address. It's not a trivial matter by any stretch of the imagination.

QA on an address database is a headache, period. The easiest way to simplify your life in this area is to make sure all the fields hold only a single piece of information that can be automatically verified as correct at entry time. Probability, distance algorithms and regular expressions can check for validity of entry and provide feedback to the user as to what their mistake was and suggest suitable corrections.

One caveat to be aware of is roads with names that are also street types - if you're covering Canada you need to be aware of "Avenue Road" in Toronto which will trip you up big time if you're using the Address1, 2, 3 format. This likely occurs in other places too, although I'm not aware of them - this single instance was enough for me to scream WTF?!

BenAlabaster
262 types of streets? Might I ask how you obtained this information?
Thomas Owens
@Thomas - Lots and lots of searching and list keeping. Australia, UK, Ireland, Canada, USA, Channel Islands, France. It was an arduous task without getting each country's postal database.
BenAlabaster
@Thomas - Don't forget that in the English parts of the world, we often steal names from other countries - i.e. the US uses Spanish names in a lot of places and Canada uses French too.
BenAlabaster
How do you handle requesting this information from users? Do you really have this many fields for them to read and decipher? I understand the need for it, but there is also a need for keeping data entry extremely simple for users, especially when tricky forms can lose you a sale or a new member.
Syntax Error
@Syntax Error: You've got a few choices, provide separate fields requiring the user move between them, or you parse the data and store it in the correct fields on the way in, or you parse it on the way out when you're reporting on it. Depending on how the system is used determines the most appropriate method. If it's heavy on reporting, parse at entry time, if it's heavy on entry, and light on reporting, parse at report time.
BenAlabaster
A: 

Comment of Ben Alabaster's Answer: To format addresses based on country, you could use a formatting table that has the ordering of the columns for each country as separate rows.

  • AddressFormat(CountryCode, FieldName, FieldOrder)

The field order can be coded to use complex grid layouts also.

There is no point in separating addresses by country. This will be chaotic as the number of countries increases and you will land in trouble if you want to find all the addresses of say, an international client. Having an Address Type suggested by Ben could also lead to ambiguities when you have an address that has both a building number and an apartment number. I could be in an apartment complex where each building has a different name. This is very common in India.

bkm
+3  A: 

Be careful not to over-analyze address formats. When you do, you're quite likely to end up with a specification most users will need to work around, effectively forcing them to use the wrong fields, or only filling the primary fields and ignoring the extra fields.

Keep things simple.

A StreetType like mentioned by BenAlabaster will cause problems when you start working with languages different from isolating languages like English or Spanish.

To show you how bad things can get in the wild: the "Henriette Roland Holststraat" in Amsterdam, built up from "Henriette" + "Roland Holst" + "straat", which can be abbreviated as the "Roland Holststraat", or "Roland Holststr.", or misspelled as "H.R.Holststr." or "Henriette Roland-Holst straat", depending on the weather. Unless you've got an up-to-date street register for each country on earth, you'll be going nowhere.

And finally, be careful that in some multilingual countries, names can be different from one language to another! For instance in Brussels where many streets have both a French and a Dutch name: "Avenu du Port" and "Havenlaan", depending on the addressee's preferred language. (Google Maps shows both names alternately, just to be on the safe side.)

You can try to devise all kinds of clever tricks here, but are the sales reps. going to understand this?

Ruben
You do raise a good point that I hadn't covered in my response. This is certainly something that should be considered when accounting for Dutch, German and other non-isolating languages.
BenAlabaster
A: 

I know this is an extremely old topic that is already answered, but I thought that I'd throw my two cents in as well. It all depends on what your project goals and how you expect your target users to enter addresses. Ben's suggestion will allow you to parse addresses accurately, but on the other hand could make for a longer (and possibly more frustrating) user data entry process. Stephen Wrighton's suggestion is simpler, and could be easier for users to enter addresses as a result.

I've also seen some models that simply had an "Address" column that would capture a typical street number, type, street name, unit / apartment number, etc. all in one column, while keeping City, Country, Region, etc. within other columns. Similar to Stephen's model, except Address1, Address2, and Address3 all consolidated into one column.

My opinion is that the most flexible models tend to be those that are least restrictive, depending on your interpretation of flexible.

Shan Plourde