views:

47

answers:

2

I have a job with around 100K records to process. I have got many suggestions to split this job in chunks and then process it.

What are the benefits of process smaller chunks of data compared to 100K records?

What is the standard way of doing it? e.g. Picking 10K records in a temp table and process at a time?

A: 

Personally, I have never heard of such a means of optimization, and if the division into chunks of 10k is completely arbitrary, then i think it would be less effective to run that 10 times, than to run it across the whole set once, because dealing with temp tables here would only be overhead, and if you do it all in one chunk, you give the database a fair chance to get an accurate idea of what you want to do, and select a proper execution plan based upon that.

If the 10-or-so-k records are not arbitrarily selected, however, but actually logically divisible into a couple of different groups (say you have a huge table 'images', which could actually be divided into 'gallery photos', 'profile photos', 'cms images', 'screenshots', or whatev), and if your process is doing that distinction at some point, then you may help the selection out by always storing these records in distinct tables. So the use of tables would help the database find the interesting rows, sort of the way an index does. But that's rather besides the point, i guess...

If you want performance, though, make sure that you drop your statistics every 24 hours or so, to give the database an accurate idea of what it's up against

David Hedlund
+1  A: 

I've just finished a project doing this - purging records from a table in multiple batches instead of all of the records at once.

The issue is speed versus concurrency.

Deleting all of the records at one time is the fastest way. However, it creates the most locks and is most likely to block other processes.

Deleting in batches is much slower, but if the batch size is chosen correctly, each batch runs fast enough that concurrency is not an issue.

The one critical point for my project was that there was no data consistency issue to worry about if all of the records were not deleted at once.

Darryl Peterson