views:

212

answers:

3

I am working on a SQL Job which involves 5 procs, a few while loops and a lot of Inserts and Updates.

This job processes around 75000 records.

Now, the job works fine for 10000/20000 records with speed of around 500/min. After around 20000 records, execution just dies. It loads around 3000 records every 30 mins and stays at same speed.

I was suspecting network, but don't know for sure. These kind of queries are difficult to analyze through SQL Performance Monitor. Not very sure where to start.

Also, there is a single cursor in one of the procs, which executes for very few records.

Any suggestions on how to speed this process up on the full-size data set?

+2  A: 

I would check if your updates are within a transaction. If they are, it could explain why it dies after a certain amount of "modified" data. You might check how large your "tempdb" gets as an indicator.

Also I have seen cases when during long-running transactions the database would die when there are other "usages" at the same time, again because of transactionality and improper isolation levels used.

If you can split your job into independent non-overlaping chunks, you might want to do it: like doing the job in chunks by dates, ID ranges of "root" objects etc.

van
Thanks Van...that is something that i was looking for...i will check and get back to youI believe the problem is that I am using single transaction processing all 75K records...probably i should change it to smaller chunks...
BinaryHacker
@van: I tried used EXEC sp_helpdb 'tempdb' to find that db_size was 1587.88, while that job was running at around 35 k records (low performance)....I killed the transaction and rolledback the changes and then checked again. the space used by tempdb was still the same.I am not sure if I know what change I am looking for?
BinaryHacker
this is absolutely no problem size for tempdb, so you do not seem to have an issue with other transactions. you own process seem to be an overkill. try to split your big job in chunks and see if it is faster (like 10K rows, then 20K, then 50K - see if the performance is non-linear). But please provide more info on what you are doing - there might be a much better solution to your task.
van
I guess splitting should help as performance is good till first 10K to 20K records, but I would better process everything at one go....I have tried explaining my process below...posted separate answers for formatting reasons.
BinaryHacker
Is it tempdb size or transaction log size that i should be checking?could transaction log size affect performance? I couldnt find anywhere on web, how a long transaction would affect peformance of a query.
BinaryHacker
IMHO, as long as you do not have other transactions working on your database at the same time, you should be fine with long-running transaction. if not, this is when your tempdb will get rather large, which is not a good sign. you might also want to check if there are any object locks while your job is running.
van
I have used nolock with all select statements and last few days I have been testing on a DB just for this testing and there shouldn't be any other queries running locking any objects.Also, I wanted to know how to check whether its the network which is killing the job. Opening a separate question for that.Thanks for help!
BinaryHacker
+1  A: 

I suspect your whole process is flawed. I import a datafile that contains 20,000,000 records and hits many more tables and does some very complex processing in less time than you are describing for 75000 records. Remember looping is every bit as bad as using cursors.

I think if you set this up as an SSIS package you might be surprised to find the whole thing can run in just a few minutes.

With your current set-up consider if you are running out of room in the temp database or maybe it is trying to grow and can't grow fast enough. Also consider if at the time the slowdown starts, is there some other job running that might be causing blocking? Also get rid of the loops and process things in a set-based manner.

HLGEM
Thanks HLGEM!Getting rid of loops was the first thing I tried but its not possible in this case. Based on every record i need to get data from lot of existing tables, perform validations and insert/update based on record exists or not and update status back to input table.It may be easier creating using SSIS, but it wont get rid of loops.I have executed job on various times, including weekends and mostly there were not any additional jobs running.
BinaryHacker
A: 

Okay...so here's what I am doing in steps:

  1. Loading a file in a TEMP table, just an intermediary.
  2. Do some validations on all records using SET-Based transactions.
  3. Actual Processing Starts NOW.

TRANSACTION BEGIN HERE......

LOOP STARTS HERE

a. Pick Records based in TEMP tables PK (say customer A).

b. Retrieve data from existing tables (e.g. employer information)

c. Validate information received/retrieved.

d. Check if record already exists - UPDATE. else INSERT. (THIS HAPPENS IN SEPARATE PROCEDURE)

e. Find ALL Customer A family members (PROCESS ALL IN ANOTHER **LOOP** - SEPARATE PROC)

f. Update status for CUstomer A and his family members.

LOOP ENDS HERE

TRANSACTION ENDS HERE

BinaryHacker
You said that getting rid of loops is not possible. However, I would challenge that statement. It is a very rare thing in SQL that some kind of improvement away from loops toward more set-based is not possible. Honestly, the scenario you described sounds very possible to do away with some of the looping.Why does step e involve looping? If looping is required due to some hierarchical data structure, why can't it be done for an entire set of customers at once instead of many times once for each customer?That is just one example where changes could be made.
Emtucifor