views:

218

answers:

4

Currently we store our address data like so:

string suiteNumber (ie. unit number)
string streetNumber (building number)
string streetName
string streetDirection (N/NW/S/etc.)
string streetType    (rd/st/ave/etc.)
// ... etc. (postal code/city/province/state/country

But I'm running into the (common from what I can tell) problem of parsing out the first 5 address parts when dealing with and importing addresses.

I'm thinking that all of this would be significantly easier if the street address were just a string (varchar in the db).

There are 2 arguments that I've been given for why we should keep it as is: 1. Searching is easier when you can search against JUST the street name or number etc. but I'm thinking that a sql script along the lines of SELECT x FROM Address WHERE streetAddress LIKE "%INPUT%"; Sure it's not as fast, but it would work (and the dataset for that search is only on customers is incredibly smaller than the set of all addresses we have stored).

  1. Currently we have a system that flags apartments - if you find that 1 person at address A is an apartment, we flag them, and it will search for all other people at that streetnumber/streetname and flag them as well (this is a sometimes important business need)

I already store them all as strings because of the myriad exceptions in addresses.

So I ask, are there particular reasons for needing/wanting to store the street address parts separately?

+4  A: 

I wrote a whole blog post about this a while back. There are very good reasons to store each piece of data in a separate field. Not least for validation of address data.

Of course, it depends what industry you're in and what the information is being used for. If invalid address data isn't costing your company anything, then by all means store invalid data. Be aware though that down the road you may wish to use this data for mailings, demographic reports etc. If the data is invalid, it's not trivial to fix it after the fact.

Here's my blog post:

http://www.endswithsaurus.com/2009/07/lesson-in-address-storage.html

Also, in reference to searching "Where StreetAddress Like '%whatever%'". This is all well and good if you're doing a quick search for your own benefit, but when you come to attempt to automate parts of your system that rely on address data or even attempt to drop duplicates, provide users with auto-suggest etc etc, performance is degraded to a point that it will become unusable the larger the address table.

If invalid addresses aren't a worry that is going to cost the company real cash, then it's not an issue - but then, if you're not using the addresses for anything that is beneficial financially (or likely to be in the future), then why are you storing that information in the first place?

@Snorfus Ah, you must be in the Prairies. I had overlooked including posting about land descriptions in my blog post but it's something I'm considering for a later post.

Legal subdivisions (LSDs) are used primarly in Oil & Gas and other primary resource industries in Alberta, Saskatchewan and Manitoba (although they are found in parts of B.C. too, they're not in such prevalent use). They all take the same format: Section, Township, Range, Meridian. For example:

SE 28-12-17-W5

This is the South Eastern corner of Section 28, Township 12, Range 17, West of the 5th Meridian.

You could simply use a single field and parse it with regular expressions or break it out into separate fields containing the breakdown of the LSD. Running regexes in SQL Server can be a pain when it comes to performance. My take on it is the same as that of address data in general, that because each piece of data is a separate unique piece of data that they should be stored in separate fields. However, given that the large majority of this type of address data is not used by the general public in lieu of a street address, I might recommend designing something that would allow this information to be separated from (but linked to) your main address data. Given however that the land description/LSD is also part of every Canadian address, I might be tempted to store it in my main address table depending on the target audience of the database.

Here's a post about the breakdown of the Alberta Land Resource System:

http://www1.agric.gov.ab.ca/%24department/deptdocs.nsf/all/agdex10302

One thing you will often find in Oil & Gas at least (which is where the bulk of my experience comes from) is that workers will often refer to only the first two parts of the LSD - i.e. 28 of 12, or 43 of 16. The remainder of the LSD is implied by the locality of the address - i.e. Grand Prairie, Fox Creek, Wolf Lake etc.

BenAlabaster
Very interesting blog post. How do you deal with Land Data (in Canada at least)? Like landquarter, landsection, landmeridianeast/west, landmeridianline etc.?
SnOrfus
+1 for very interesting blog post. How the heck does Google maps figure out what I mean via free text entry? I can throw just about any sort of address format at it and it works well.
I82Much
I'm hypothesising, but this is what I've got: They've got a heap of commodity computers to throw at parsing the information you enter. It's then checked against data that's been validated and stored properly to make rapid scans extremely quick. It's doubtful that each machine will have to parse millions of addresses to validate that data. A dozen machines could parse and validate against only 100,000 unique addresses for Canada which would be far faster than a single machine validating against a list of 1.2 million.
BenAlabaster
I once tried writing some code to parse addresses. The thing is, I never even got close to knowing about or properly handling all variations of an address format. I was working with U.S. addresses only. To suggest that every application that deals with addresses on any level should require the expertise required to properly parse and catalog every possible address format applicable to that application is an unnecessary complication, IMO. Besides, any business that needs to ship in bulk is likely to invest in address standardization software to do the bulk of that work anyway.
Chris
@Chris Vann: I've done some serious research on the address standardisation software and to be honest, if you're handling international address information it all sucks. You need to be an expert in the domain of any data that you're handling in order to do the job well - that's just a fact of life. If you don't understand the data you're handling then how on earth can you expect to handle it properly?
BenAlabaster
I wish I could +1 again. Thank you for the edit about land descriptions. I am from SK, but newly. I just came here from ON so it was all news to me when I got here and there was a large number of rural addresses with that kind of address info.
SnOrfus
@SnOrfus - I lived in Calgary for 2 years and spent the majority of that time in the oil patch so for a while I had to live and die by land descriptions to find my way around. Some of the oil fields are hundreds of kilometres from the nearest finished road. If you need any more info, give me a shout and I'll see what I can hook you up with ben at afsinc dot ca.
BenAlabaster
A: 

In Europe, the street address is usually a name plus a "number" (where number can be something like "3a"). I've seen databases which store them separately for a single reason: You can look up the street names in an official database to verify them (for example to protect against typos). So for this use case, it makes sense to keep the verifiable and the non-verifiable parts in different columns.

I doubt that you can find a reason to break it down further except for a fuzzy fear that you might lose information.

Aaron Digulla
A: 

Is a benefit if you are following an Objected Oriented approach for modeling your whole domain. Your question reminds me this blog title March is not a number as an answer. Something analogue could be say about streets and addresses ("a street is not a string"). SnOrfus points out a valid problem on his comment.

JuanZe
+2  A: 

I used to think that was a good idea, until my applications were deployed and a constant stream of requests came in for changes. At the time, I lived in Ontario, Canada and I thought that I knew what a standard address looked like. Until some customer had an address that combined the P.O. Box and the street address into one. Then the Alberta customers started coming in with their structured codes mentioned in another answer. Then British Columbia adresses where there was no street or street number, just a Site and Compartment and Rural Route. C4,S16 RR7 Mountainville. And then with American suppliers, the postal code rules went out the window. And then the occasional British customer appeared in the database and everything that you thought you knew about addresses goes out the window. A building name with no street number, two street names, two town names all in one address!

Bright House,
Waverly Crescent off Oxford Road,
Seething-under-Norton, Banbury,
Oxfordshire
OB7 3VT
United Kingdom

That is a made up example, but they do exist. The British manage to get by because every local company has an up to date national address database and all they need is the postcode and house name or number. The rest is filled in from the database.

In the case of that address, there is probably another Waverly Crescent in Seething-under-Norton, which is why the second street name. And Seething-under-Norton was a village which long became incorporated into the town of Banbury, so both names are in the address. In British addresses you often get municipalities which do not exist. They are considered postal towns in that they exist only within the postal system. There is usually a historical basis for the name. Lots of London addresses are like that with people writing London one time, and Leyton or South Ruislip or Hillingdon another time. The letters all do get delivered promptly.

So unless a feature of your software is that it prevents foreign address entries into the system, don't do this!

By the way, you mentioned identifying all the people on the same street by the street name. Have you checked out Denver Colorado where there are street names which end and pick up again, a mile further away. I once got lost in Littleton (Denver suburb) trying to find a certain address only to be told that I needed another such-and-such street which was elsewhere. Then there is the British practice of using two or more names for every road. For instance, there will be a Homerton Road which is then named Marsh Hill and then Homerton High Street and then Urswick Road and then Lower Clapton Road all in the space of a kilometer or two. More commonly, in the village of Wick there will be a Norton Road. If you follow it, after a mile or two you will note that your are now on Wick Road, entering the village of Norton.

Michael Dillon