views:

60

answers:

3

I'm creating a batch process to move some information from my database to external files. This task is manageable, but there's a lot of data that needs to be processed and it will probably take about a month.

During this month new information will be continuously uploaded. Is there any way I can do my batch processing and then come back to the new records and process them (we cannot turn off our upload system while the batch processing takes place).

I was thinking possibly to majority of the batch processing up to a certain date which will be about 95% of the total batch processing, the other 5% left will be the new records entered and process them? Any thoughts?

A: 

Why don't you rely on a "journal" table to track which rows have been processed through your "download" process? If new "entries" come, they'll just be picked up later, no?

Of course, you process your batch in order by using a "creation/modified" date field indexing field.

NOTE that I am sure why you need to treat the current records from the new ones being written during your "month". Please clarify. If you need to make a distinction, then just start your "download" process with a specific "stop" date.

jldupont
A: 

If you are using SQL Server 2008 you can use change tracking to query for changes in your database since your last batch.

Nestor
A: 

+1 on journal idea, but have you considered proxying?

You don't indicate if this application gives you the freedom to modify your own schema or your information upload routines. If you can keep track the last modified time on each record, and the last backup time of each record, you can easily query for those records recently modified. Alternatively, you might scan your journal of records that were modified during your previous backup procedure, and perform catch-up at the end.

Depending on how frequently your table is modified, it might make sense to jump ahead in the flow and proxy the input process to the database. Your input proxy would then immediately perform the save to the external file, then update the database as expected. This would also give you leniency to export those records for which there were not already files.

memnoch_proxy