views:

33

answers:

4

Hey,

I have a table that has about 400,000+ rows. I am writing some pattern matching code but need to clean up a column before I do so. This boils down to doing a replace-like operation.

I tried listing them all out one at a time...

 Update T_ADDRESS set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southeast ',' se ')
 Update T_ADDRESS set ADDR_LINEONE = REPLACE(ADDR_LINEONE,' southwest ',' sw ')

Since I have over 500 of these...it took too long.

Now I am trying to nest them...

 Update T_ADDRESS set ADDR_LINEONE = REPLACE(REPLACE(ADDR_LINEONE,' southwest ',' sw '),' southeast ',' se ')

But this is still painfully slow. I need to make this code work on tables of all sizes (1 record to 5 million records).

Anyone have any advice? I am using SQL Server by the way.

+1  A: 

Write a CLR procedure. TSQL is not great at (or designed for) handling large numbers of string manipulations.

Regular Expressions Make Pattern Matching And Data Extraction Easier

Mitch Wheat
A: 

Create #TEMP table with new values then inner join Something like this

create table #TempValues
(oldAres varchar(12),newadres varchar(2))

insert into #TempValues
select 'southeast','se'
union all 
select 'southwest','sw'

update T_ADDRESS
set addr_lineone=t.newadres
from T_ADDRESS inner join #TempValues t on T_ADDRESS.addr_lineone=t.oldAdres
adopilot
+4  A: 

You have to always scan the table end-to-end no matter how fancy you do the REPLACE. This is what is killing performance, and it cannot be changed since you have to way of indexing the ADDR_LINEONE field in any sensible manner.

Since this should be a one-time only operation, the long time should not matter.

If this is a repeated operation, then your problem is not here, is in how you load the data into the table: do the transformation before you save the data, otherwise you stand no chance.

Remus Rusanu
+1: This is the tax you pay for allowing bad data
OMG Ponies
A: 

You could write the update statement to use a CASE statement in place of the multiple REPLACEs although I've no idea whether this will execute any faster for you.

Chris W