Hi there,
I have been working on a stored procedure performance problem for over a week now and is related to my other post on Stackoverflow here. Let me give you some background information.
We have a nightly process which runs and is started by a stored procedure which calls many many many other stored procedures. Lots of the called stored procedures call others, etc. I have looked at some of the called procs and there is all sorts of frightnening complicated stuff in there such as XML string processing, unnecessary over-use of cursors, NOLOCK hints over-used, rare use of set-based processing, etc - the list goes on, it's quite horrendous.
This nightly process in our production environment takes on average 1:15 to run. It sometimes takes 2 hours to run which is unacceptable. I have created a test environment on identical hardware to production and run the proc. It took 45 minutes the first time I ran it. If I restore the database to the exact same point and run it again, it takes longer: indeed, if I repeat this action several times (restoring and re-running), the proc takes progressively longer until it plateaus at around 2 hours. This really puzzles me because I restore the database to the exact same point every time. There are no other user databases on the server.
I thought of two lines of investigation to pursue:
- Query plans and parameter spoofing
- Tempdb
As a test, I restarted SQL Server to clear out both the cache and tempdb and re-ran the proc with the same database restore. The proc took 45 minutes. I repeated this several times to ensure that it was repeatable - again it took 45 minutes each time. I then embarked on several tests to try and isolate the puzzling increase in run times when SQL Server does not get restarted:
- Run the initial stored procedure WITH RECOMPILE
- Before running the procedure, executre DBCC FREEPROCCACHE to clear out the procedure cache
- Before running the procedure, execute CHECKPOINT followed by DBCC DROPCLEANBUFFERS to ensure that the cache was empty and clean
Executed the following script to ensure all stored procedures were marked for recompilation:
DECLARE @proc_schema SYSNAME DECLARE @proc_name SYSNAME DECLARE prcCsr CURSOR local FOR SELECT specific_schema, specific_name FROM INFORMATION_SCHEMA.routines WHERE routine_type = 'PROCEDURE' OPEN prcCsr FETCH NEXT FROM prcCsr INTO @proc_schema, @proc_name DECLARE @stmt NVARCHAR(MAX) WHILE @@FETCH_STATUS = 0 BEGIN SET @stmt = N'exec sp_recompile ''[' + @proc_schema + '].[' + @proc_name + ']''' -- PRINT @stmt -- DEBUG EXEC ( @stmt )
FETCH NEXT FROM prcCsr INTO @proc_schema, @proc_name END
In all the above tests, the procedure takes longer and longer to run with the same database restore. I am really at a loss now as to what to try. Looking into the code at this point is an option but realistically its going to take 3-6 months to get that optimised as there is lots of room for improvement there. What I am really interested in getting to the bottom of, is why does the proc execution time get longer each time when a database restore has been performed even when the procedure and buffer caches have been cleaned?
I did also investigate tempdb, and try and clear out old tables in there as described in my other stackoverflow post, but I am unable to manually clear out temp tables that were created from table variables manually and they don't seem to want to disappear on their own (even after leaving them for 24 hours).
Any insight or suggestions for further testing would be greatly appreciated. I am running SQL Server 2005 SP3 64-bit Enterprise edition on a Windows 2003 R2 Ent. edition cluster.
Regards, Mark.