ansaurus

Question

I need help splitting addresses (number, addition, etc)

Answer 1

+4 A:

SQL Server and T-SQL are rather limited in their processing prowess - if you're really serious about heavy-lifting and regexes etc., you're best bet is probably creating an assembly in C# or VB.NET that does all that tricky Regex business, and then deploying that into SQL-CLR and use the functions in T-SQL.

"Pure" T-SQL cannot really handle much string manipulation beyond SUBSTRING and CHARINDEX - but that's about it.

marc_s 2010-07-21 15:27:53

+1: But you forgot about [PATINDEX](http://msdn.microsoft.com/en-us/library/ms188395.aspx), which [has limited pattern matching](http://msdn.microsoft.com/en-us/library/ms187489.aspx)

OMG Ponies 2010-07-21 15:34:34

@OMG Ponies: ok, I give you "very limited" pattern matching :-)

marc_s 2010-07-21 15:39:24

Yeah, this seems to be the best option indeed... Thanks, I think I will try to do something like this.

Lex 2010-07-21 15:45:59

@Lex: Separate from how you implement your solution, you might be able to find more information on address-parsing rules. You will find out that many have had this problem, and maybe they're sharing their knowledge.

bobs 2010-07-21 16:23:13

Answer 2

A:

This sounds like the common "take a piece of complex text that could look like anything and make it look like what we now want it to look like" problem. These tend to be very hard to do using only T-SQL (which does not have native regex functionality). You will probably have to work with complex code outside of the database to solve this problem.

Philip Kelley 2010-07-21 15:28:16

Answer 3

A:

This should help. Cheers!

Hal 2010-07-21 15:29:38

YIKES! Instantiating an external COM object - my skin crawls ....

marc_s 2010-07-21 15:31:33

Got any better ideas? SQL Server's rather limited in this matter

Hal 2010-07-21 15:32:30

@Hal: yes - SQL-CLR ! As of 2005, that's definitely the best and easiest way to extend SQL Server - beats external COM stuff hands-on, every time

marc_s 2010-07-21 15:39:54

Thanks Hal for your answer. I will take a closer look at it tomorrow, but I think this is way to complicated. For me it would probably be faster to create some external script..

Lex 2010-07-21 15:47:42

Answer 4

+1 A:

In answer to your "Is there any way to use regexes in a query?", then yes there is, but it needs a little .NET knowledge. Create a CLR assembly with a user-defined function that does your regex work. Visual Studio 2008 has a template project for this. Deploy it to your SQL server and call it from your query.

Tim 2010-07-21 15:30:56

+1: Why was this marked down? It's correct - on SQL Server 2005+, SQLCLR is a means of getting regex functionality that TSQL will never natively support.

OMG Ponies 2010-07-21 15:46:16

Answer 5

+1 A:

Name and Address parsing and standardization is probably one of the most difficult problems we can encounter as programmers for precisely the reasons you've mentioned.

I assume that whoever you work for their main business is not address parsing. My advice is to buy a solution rather than build one of your own.

I am familiar with this company. Your address examples appear to be non US or Canadian so I don't know if their products would be useful, but they may be able to point you to another vendor.

Other than a user of their products I am not affiliated with them in any way.

TGnat 2010-07-21 15:33:20

Answer 6

+3 A:

Something like this maybe?

SELECT
   substring([address_field], 1, patindex('%[1-9]%', [address_field])-1) as [STREET],
   substring([address_field], patindex('%[1-9]%', [address_field]), len([address_field])) as [NUMBER_ADDITON]
FROM
   [table]

It relies on the assumption that the [street] field will not contain any numbers, and the [number_addition] field will begin with a number.

dave 2010-07-21 15:53:55

Thanks, it looks promising, I will try it tomorrow!

Lex 2010-07-21 15:56:57

This works pretty well!! I think I will be able to manage with this. Please note that you have syntax errors in the query, both substrings have one ')' too much at the end.

Lex 2010-07-22 07:33:46

Answer 7

A:

TGnat is correct. Address standardization is complicated.

I've encountered this problem before.

If your customer doesn't want to spring for the custom software, develop a simple GUI that allows a person to take an address and split it manually. You'd delete the address row with the old format and insert the row with the new address format.

It wouldn't take long for typists familiar with your addresses to manually make 100,000 changes. Of course, it's up to the customer if he wants to spend the money on custom software or typists.

But you shouldn't be stuck with the data cleaning bill, either.

Gilbert Le Blanc 2010-07-21 16:13:22

ansaurus

tags:

views:

answers:

I need help splitting addresses (number, addition, etc)

related questions