views:

3757

answers:

5

I have a database with a large number of fields that are currently NTEXT.

Having upgraded to SQL 2005 we have run some performance tests on converting these to NVARCHAR(MAX).

If you read this article:

http://geekswithblogs.net/johnsPerfBlog/archive/2008/04/16/ntext-vs-nvarcharmax-in-sql-2005.aspx

This explains that a simple ALTER COLUMN does not re-organise the data into rows.

I experience this with my data. We actually have much worse performance in some areas if we just run the ALTER COLUMN. However, if I run an UPDATE Table SET Column = Column for all of these fields we then get an extremely huge performance increase.

The problem I have is that the database consists of hundreds of these columns with millions of records. A simple test (on a low performance virtual machine) had a table with a single NTEXT column containing 7 million records took 5 hours to update.

Can anybody offer any suggestions as to how I can update the data in a more efficient way that minimises downtime and locks?

EDIT: My backup solution is to just update the data in blocks over time, however, with our data this results in worse performance until all the records have been updated and the shorter this time is the better so I'm still looking for a quicker way to update.

+3  A: 

How about running the update in batches - update 1000 rows at a time.

You would use a while loop that increments a counter, corresponding to the ID of the rows to be updated in each iteration of the the update query. This may not speed up the amount of time it takes to update all 7 million records, but it should make it much less likely that users will experience an error due to record locking.

Yaakov Ellis
Yes, this is my current plan, it will take a lot of organisation to get it all to work but should be achievable. I will also only update one field at a time so I don't get degraded performance until I'm ready to update that column. I'm just hoping there is some magic solution out there I don't know.
Robin Day
Running in batches this way generally does improve the time it takes to do the update when you aretalking about these kinds of records.
HLGEM
A: 

Running a database test on a low performance virtual machine is not really indicative of production performance, the heavy IO involved will require a fast disk array, which the virtualisation will throttle.

ck
+4  A: 

If you can't get scheduled downtime....

create two new columns: nvarchar(max) processedflag INT DEFAULT 0

Create a nonclustered index on the processedflag

You have UPDATE TOP available to you (you want to update top ordered by the primary key).

Simply set the processedflag to 1 during the update so that the next update will only update where the processed flag is still 0

You can use @@rowcount after the update to see if you can exit a loop.

I suggest using WAITFOR for a few seconds after each update query to give other queries a chance to acquire locks on the table and not to overload disk usage.

John
+3  A: 

If you can get scheduled downtime:

  1. Back up the database
  2. Change recovery model to simple
  3. Remove all indexes from the table you are updating
  4. Add a column maintenanceflag(INT DEFAULT 0) with a nonclustered index
  5. Run: UPDATE TOP 1000 tablename SET nvarchar from ntext, maintenanceflag = 1 WHERE maintenanceflag = 0

Multiple times as required (within a loop with a delay).

Once complete, do another backup then change the recovery model back to what it was originally on and add old indexes.

Remember that every index or trigger on that table causes extra disk I/O and that the simple recovery mode minimises logfile I/O.

John
A: 

You might also consider testing to see if an SSIS package might do this more efficiently.

Whatever you do, make it an automated process that can be scheduled and run during off hours. the feweer users you have trying to access the data, the faster everything will go. If at all possible, pickout the three or four most critical to change and take the database down for maintentance (during a normally off time) and do them in single user mode. Once you get the most critical ones, the others can be scheduled one or two a night.

HLGEM