views:

47

answers:

1

Hi guys,

I have this setup: an output table where my results will end up consisting of

[FirstName]
[LastName]
[Address1]
[Address2]
[Address3]
[ZipCode]
[City]
[Country]

The original data consists mostly of line feed split address lines, some are split by commas, and there is one name field.

My original data tables look simply like this:

[NAME ]
[ADDRESS]

How would I translate these into one another using T-SQL?

Thanks for your time.

Matt

+1  A: 

Your real question here is, how do I parse the contents of [ADDRESS] into separate attributes? I would forget about TSQL for now and investigate the quality of the source data to see how easy (or not) it is to parse in theory. Then find a practical solution. Some random comments:

  • Address data is notoriously 'dirty' and parsing it is usually difficult, especially international data.
  • TSQL is not a good language for text manipulation in general; if you eventually decide to parse the data yourself, then you should almost certainly do it in another language outside the database (C#, Perl, Python, whatever).
  • Even if you already have very clean data, there is no SPLIT() function in TSQL. It's easy to write one but String.Split() or an equivalent in your preferred language will probably be faster and easier. So even if you do have the luxury of clean data, consider an external program or possibly a .NET stored procedure.
  • If you have a large amount of data, look into companies who can parse and validate address data for you; typically, they offer bulk solutions and on-demand web services. You might think it's expensive, but they can do things you probably can't (e.g. fuzzy matching addresses against postal databases from different countries) and compared to the cost of your time now plus future maintenance you might find it's actually cheaper.
  • If you absolutely have to parse the data yourself, make sure you develop an extensive set of unit tests.
Pondlife