views:

89

answers:

1

I have a very large set of data (~3 million records) which needs to be merged with updates and new records on a daily schedule. I have a sproc that actually breaks up the recordset into 1000 record chunks and uses the MERGE command with temp tables in an attempt to avoid locking the live table while the data is updating. The problem is that it doesn't exactly help. The table still "locks up" and our website that uses the data recieves timeouts when attempting to access the data. I even tried splitting it up into 100 record chunks and even tried a WAITFOR DELAY '000:00:5' to see if it would help to pause between merging the chunks. It's still rather sluggish.

I'm looking for any suggestions, best practices, or examples on how to merge large sets of data without locking the tables.

Thanks

+2  A: 

Change your front end to use NOLOCK or READ UNCOMMITTED when doing the selects.

You can't NOLOCK MERGE,INSERT, or UPDATE as the records must be locked in order to perform the update. However, you can NOLOCK the SELECTS.

Note that you should use this with caution. If dirty reads are okay, then go ahead. However, if the reads require the updated data then you need to go down a different path and figure out exactly why merging 3M records is causing an issue.

I'd be willing to bet that most of the time is spent reading data from the disk during the merge command and/or working around low memory situations. You might be better off simply stuffing more ram into your database server.

An ideal amount would be to have enough ram to pull the whole database into memory as needed. For example, if you have a 4GB database, then make sure you have 8GB of RAM.. in an x64 server of course.

Chris Lively
The nolock is not an option with the selects. The code uses nhibernate, and even after setting nhibernate to use no lock it still times out. We have added more RAM to the server and we ran the sql profilier and it did some enhancements that will supposedly increase efficiency by 41%. Thanks for your suggestions, Chris.
Josh
@Josh: The only reason for the timeouts while using NOLOCKs is if the server is simply hammered for resources and is doing a lot of disk access. In other words, it's IO bound due to lack of memory. The three things you need to do are: profiling and fixing the queries, adding RAM (you've done those), and potentially changing the drives to use something that is much faster. However, the biggest impact is going to come from the index updates and RAM.
Chris Lively
@Chris: I ended up dumping the merge statement in favor of explicit updates and inserts. When combining that with the splitting out of 1000 records at a time it's both faster and doesn't lock the page file like merge does. Thanks again for your suggestions. They did help us!
Josh
@Josh: Glad it helped lead you to a solution. I'm going to have to check our own merge usage to see if we have any similar issues on the mass updates we do. I haven't seen any issues yet, but one of our products is about to be scaled up by a factor of 10 so it might be good practice to test.
Chris Lively