views:

104

answers:

2

I've a large (1TB) table in SQL Server 2008 that looks something like this:

ID int | Flag BIT | Notes NTEXT

I need to search every row and set the Flag bit to 1 where Notes contains the word flip

Is

UPDATE Table SET Flag = 1
WHERE Notes LIKE '%flip%'

the 'best' way to do it?

I'm thinking that this could take days to run on such a large table. I have tried running a

SELECT TOP (10) * FROM Table
WHERE Notes LIKE '%flip%'

and it is still running after 10 minutes - so performance isn't looking good.

Would creating a C# app to read/update each row be a better method. At least I could then do the change in bits without locking up the table for days.

Are there any other methods I should consider?

+2  A: 

Did you try full text indexing ?

Dmitry
Thanks Dmitry. But I would like a quick solution to this particular problem. Adding Full Text Indexing will take forever on a table this size, and it probably overkill.
Mr. Flibble
I don't think there is a quick solution because queries like this can't make use of any index, so they have to scan the whole table and match each row against your pattern. Your table is 1TB big, so it takes time. You hardly can do anything better processing it on the client side, unless you fetch portions of your table on different machines and process them in parallel (which is probably overkill too).
Dmitry
See above as well, but fulltext search won't help for suffix or leading-wildcard searches within a given term/word. http://stackoverflow.com/questions/1758393/is-there-an-efficient-way-to-index-inside-words-in-sql-server-2008/1758480#1758480
chadhoc
+3  A: 

I would recommend running a select like you have above (without the top 10 clause) to pull out the IDs of the records you want to update (assuming Id is indexed) into a temp staging table. Your select will scan the entire table, but that's better than doing so within an update statement. If you can make use of Read Committed Snapshot Isolation, that will be even better for concurrency. If you need to use a looping method, ie checking a small set of records iteratively, use the ID (or some seekable column) as your primary filter to get a chunk of records to pattern match and continue until finished. Ie something like this:

Select  top x id
From    Table
where   Id between 1 and 100000
And     Textcolumn like('%blah%')

Then continue iterating until you hit all the ID ranges.

Then, once you have the IDs to update, run your update against those instead.

Like Dmitry says, top 10 selects will still scan the entire table for this type of query, so that will simply slow you down. Once you have the IDs, or something that would make a query seekable, you could then include a top x clause to reduce the concurrency impact, but that will likely only make sense if you have a very large number of records to affect.

chadhoc
Thanks chadhoc. That worked well.
Mr. Flibble