views:

53

answers:

1

On our dev boxes, our database resides entirely in the PRIMARY filegroup, and everything works fine.

On one of our production servers, recently upgraded from 2005 to 2008, we noticed it was performing slower than it should. On this machine, there are two filegroups - PRIMARY and INDEXES. Both filegroups contain 1 file per logical volume, one logical volume per CPU. The filegroups are sharing logical volumes (both of them are spread out across all disks).

We isolated a few queries that were performing fast on the dev boxes and slow (up to 40x slower) on the production machine. Turned out these queries were using the non-clustered indexes that resided in the INDEXES filegroup. Tweaking some of the queries to only use clustered indexes that were in the PRIMARY filegroup dropped their times back to normal.

As a final confirmation, we redeployed the same database on the same machine to have everything in PRIMARY, and things went back to normal!

Here's the statistics output of one of the queries, run on the machine in question (table names changed to protect the innocent):

FAST (everything in PRIMARY filegroup):

(3 row(s) affected)
Table '0'. Scan count 2, logical reads 14, ...
Table '1'. Scan count 0, logical reads 0, ...
Table '1'. Scan count 0, logical reads 0, ...
Table '2'. Scan count 2, logical reads 7, ...
Table '3'. Scan count 2, logical reads 1012, ...
Table '4'. Scan count 1, logical reads 3, ...

 SQL Server Execution Times:
   CPU time = 437 ms,  elapsed time = 445 ms.

SLOW (indexes split into their own filegroup):

(3 row(s) affected)
Table '0'. Scan count 209, logical reads 428, ...
Table '1'. Scan count 0, logical reads 0,...
Table '2'. Scan count 1021, logical reads 9043,....
Table '3'. Scan count 209, logical reads 105754, ....
Table '4'. Scan count 0, logical reads 0, ....
Table '5'. Scan count 1, logical reads 695, ...
**Table '#46DA8CA9'. Scan count 205, logical reads 205, ...**
Table '6'. Scan count 6, logical reads 436, ...
Table '7'. Scan count 1, logical reads 12,....

 SQL Server Execution Times:
   CPU time = 17581 ms,  elapsed time = 17595 ms.

It's pretty clear that having a second file group is making SQL Server batty with choosing an execution plan. Looking at the corresponding execution plans for the query in each configuration verifies this, they are quite different.

What's going on? Any insights much appreciated.

+2  A: 

Random thoughts

  • The statistics etc weren't disable on rebuild into new filegroup?
  • Maintenance tasks do not look at the filegroup?
  • Heavily fragmented disk?
  • Try MAXDOP 1
  • Exactly same fillfactor etc?

The IO looks way off so I wonder if stats or fragmentation is causing an issue

gbn
Thanks for the suggestions. I truncated the stats in the post but there are 0 physical reads in all cases (memory is handling the queries in question), and performance monitor shows zero disk queuing, so we're pretty sure it's a "logical" issue. So the fragmentation suggestion is unlikely - I'll investigate the others!
womp
Womp - forget physical reads. That's unrelated - it just has to do with the use of cache. Only troubleshoot logical reads. I think gbn's suggestions are dead on - the dev box has different statistics, different indexes, fragmented pages, heaps, different sp_configure options, etc.
Brent Ozar