views:

92

answers:

3

My first post here, please be gentle. =)

I work for a company that inherited the maintenance of a bespoke system used by one of our customers. The previous developer (no longer with us) encrypted all the database objects (WITH ENCRYPTION).

The system has been plagued with various timeout issues well before we took ownership of it, and we want to get to the bottom of these.

The database is on SQL Express 2005 in production. We want to run the profiler on it but because the various objects are encrypted, most stored procedure calls etc.. show up as '-- Encrypted Text'.

Not very useful. I've written a little console app in C# to decrypt all the database objects, which works perfectly as far as I can tell

It finds all encrypted objects in the database and for each one, decrypts it, removes the with encryption clause, drops the original and recreates it using the new 'without encryption' text.

There are some computed columns that get dropped before trying to decrypt the functions that are used in their definitions, then get recreated.

What I'm finding is that once everything is decrypted, I can't get into the system because the stored procedures etc.. take far too long to run on their first call. Execution plans are being compiled for the first time, so some delay is understandable, but we're talking 1 minute plus.. after 30 seconds the command timeout is hit, so the plans never get compiled.

I also have the same issue if I drop and recreate the database objects using their original scripts (keeping the WITH ENCRYPTION clause in).

So there's some consistency there. However, what absolutely mystifies me is that if I drop the execution plans from the original copy of the database (which was created from a backup of the production database), the same stored procedures are much faster. 10 seconds for first call. As far as I can tell, the stored procedures, functions etc.. are the same.

From my testing, I don't think it's a particular procedure or function that is causing the problem. It seems like the delay is cumulative, the more objects I drop & recreate the slower things are.

I've taken a few random stabs in the dark, rebuilding indexes and updating stats - this has had no effect at all.

We could write something to execute all 540 functions, triggers, sprocs etc.. to pre-empt the first real call from a user, however once SQL server is restarted (and our client does restart their server from time to time) the execution plans will be dropped and we'd need to run the same tool again. To me it doesn't seem a viable option (neither is increasing the CommandTimeout property), I want to know why I'm seeing this behaviour.

I've been using sys.dm_exec_query_plan and sys.dm_exec_sql_text to look at the execution plans, and using DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE as part of my testing.

I'm totally stumped, please help me before I jump out the office window.

Thanks in advance,

Andy.

--EDIT--

I don't quite know how I missed it, but the Activity Monitor is showing a session being blocked by a recompile of a table valued function. It takes far too long to compile and the blocked query hits the timeout.

I don't understand why in original version of the database (restored from backup taken from the customer site), the compilation takes around 10 seconds, but after dropping and recreating these objects in the same database, the table valued function takes almost a minute to compile.

I've tried truncating the log, which didn't have any effect. I still need to have a look at the file sizes.

-- Another edit --

The TVF returns a temporary table, and has 12 outer joins in the query, all on either sys.server_principals or sys.database_role_members.

I seem to remember reading something about recompiles and temporary tables, which I'll have to check again..

+7  A: 

First recommendation:

Don't jump out the window unless you have an inflatable slide and a few beers!

Second recommendation:
You said yourself that (computed) columns were dropped. Is it possible that other stuff was manipulated in the tables? If so, you will probably want to reindex your tables, (which will update the tables' statistics as well) using a command such as:

Exec sp_msForEachTable @COMMAND1= 'DBCC DBREINDEX ( "?")'

...though it sounds like you've done something like this. Either way, I recommend doing it once you make such a big change to all of those objects.

Third recommendation: While you are waiting for your procs to execute, run an sp_who2 on the database to make sure nothing is blocking your queries. It's quite possible that you might have some sort of long-lived transaction happening that you haven't accounted for.

Fourth recommendation: Make sure your server has enough memory. Make sure your transaction log files and datafiles aren't auto-growing after all of those big index and object updates. That can take FOREVER to happen, especially on under-spec'ed hardware like you may have running SQL Express.

Fifth recommendation: Run a SQL Server Profiler trace against the database and look at what statements are starting specifically, and which are timing out. "Zoom in" on those and analyze them piece by piece and see what's up. This will likely just take a lot of good ol' hard work to fully understand.

In summary, the act of dropping and recreating procs itself shouldn't cause this slowdown if the statistics and indexes they were initially built against are sufficiently similar to what they are now. It's likely that you will find that there's Something Else happening which isn't necessarily directly related to changing the proc definitions themselves.

Dave Markle
Thanks (to all) for having a look at this.I've done the rebuild indexes dance, that didn't have any effect at all. I then did the update statistics samba, which did nothing either.I didn't use sp_who2 but I did look at the Activity Monitor to see if there was anything blocking but nothing showed up. I noticed that the connection was more CPU intensive than I/O.. which I guess figures if it's a compilation thing rather than volume of data being the issue.I have used the profiler as stated in recommendation 5, thought I had it all figured out, but did some tests and still had the same prob
Andy Sweetman
lem - I tried not dropping the item I thought was the issue, but other slowness came up.I totally agree on the something else! My brain is trying to crawl out of my ear, so I'm going to pick this up tomorrow :)
Andy Sweetman
Make sure you check your file sizes next!
Dave Markle
+3  A: 

Another shot in the dark: Were the computed columns which you had to drop originally persisted (and not persisted after recreation) or vice versa?

If the functions called in the computation are complex or expensive, persisted columns are very advantageous and might be responsible for the behavior you are seeing.

Cade Roux
Nope, not persisted - good shout though.
Andy Sweetman
A: 

Turns out that if I pass the parameter of the TVF to a variable, then replace where the original parameter was used, normal service is resumed (query takes less than a second, instead of a minute!)

Some kind of parameter sniffing shenanigans going on, I don't really understand why though - at the point I'm trying to call the function, no query plans exist, good or bad.

I'm in contact with Microsoft on this one (first time I've ever used my MSDN support entitlement) so hopefully we'll find out more and I'll post what I've discovered.

Thanks all for your help, we're getting there!

Andy Sweetman