views:

99

answers:

2

I have a table myTable with myGuid (uniqueidentifier), myValues (float), myGroup (integer) and a bunch of other fields which are not important right now. I want to do something as simple as:

SELECT SUM(myValues)
  FROM myTable
 WHERE myGuid IN (SELECT * FROM ##test)
 GROUP BY myGroup

##test is just a temporary table with a single field (guid_filter) containing a bunch of uniqueidentifiers.

Now here's the strange thing:

  • When I create myTable with myGuid as the Primary Key (which seems like the obvious thing to do), the query is slow (<EDIT>8-12s</EDIT>).

  • When I create myTable with myAutoInc, an integer auto-increment field, as the Primary Key, the query is fast (~2s), even though the WHERE clause still filters by myGuid. (myGuid has just a "normal" non-clustered index in this scenario.)

Is there any logical explanation to this? My (naive) assumption was that the first option is faster, since SQL Server can use the guid to look up myValues rather than having to go through guid -> myAutoInc -> myValues. So, the result was very surprising for me.

Here's the SHOWPLAN_TEXT output. Slow scenario (XML query plan): (EDIT: Updated, thanks to Remus for noticing that there was an unnecessary additional non-clustered index on myGuid)

|--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [Expr1015]=(0) THEN NULL ELSE [Expr1016] END))
     |--Stream Aggregate(GROUP BY:([myDB].[dbo].[myTable].[myGroup]) DEFINE:([Expr1015]=COUNT_BIG([myDB].[dbo].[myTable].[myValues]), [Expr1016]=SUM([myDB].[dbo].[myTable].[myValues])))
          |--Sort(DISTINCT ORDER BY:([myDB].[dbo].[myTable].[myGroup] ASC, [myDB].[dbo].[myTable].[myGuid] ASC))
               |--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[##test].[guid_filter], [Expr1014]) OPTIMIZED WITH UNORDERED PREFETCH)
                    |--Table Scan(OBJECT:([tempdb].[dbo].[##test]))
                    |--Clustered Index Seek(OBJECT:([myDB].[dbo].[myTable].[PK__myTable__2334397B]), SEEK:([myDB].[dbo].[myTable].[myGuid]=[tempdb].[dbo].[##test].[guid_filter]) ORDERED FORWARD)

Table 'myTable'. Scan count 0, logical reads 38046, physical reads 1, read-ahead reads 6914, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '##test'. Scan count 1, logical reads 23, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Fast scenario (XML query plan):

|--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [globalagg1009]=(0) THEN NULL ELSE [globalagg1011] END))
     |--Stream Aggregate(GROUP BY:([myDB].[dbo].[myTable].[myGroup]) DEFINE:([globalagg1009]=SUM([partialagg1008]), [globalagg1011]=SUM([partialagg1010])))
          |--Parallelism(Gather Streams, ORDER BY:([myDB].[dbo].[myTable].[myGroup] ASC))
               |--Stream Aggregate(GROUP BY:([myDB].[dbo].[myTable].[myGroup]) DEFINE:([partialagg1008]=COUNT_BIG([myDB].[dbo].[myTable].[myValues]), [partialagg1010]=SUM([myDB].[dbo].[myTable].[myValues])))
                    |--Sort(DISTINCT ORDER BY:([myDB].[dbo].[myTable].[myGroup] ASC, [myDB].[dbo].[myTable].[myAutoInc] ASC))
                         |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([myDB].[dbo].[myTable].[myGroup], [myDB].[dbo].[myTable].[myAutoInc]))
                              |--Nested Loops(Inner Join, OUTER REFERENCES:([myDB].[dbo].[myTable].[myAutoInc], [Expr1017]) OPTIMIZED WITH UNORDERED PREFETCH)
                                   |--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[##test].[guid_filter], [Expr1016]) OPTIMIZED WITH UNORDERED PREFETCH)
                                   |    |--Table Scan(OBJECT:([tempdb].[dbo].[##test]))
                                   |    |--Index Seek(OBJECT:([myDB].[dbo].[myTable].[myGuid]), SEEK:([myDB].[dbo].[myTable].[myGuid]=[tempdb].[dbo].[##test].[guid_filter]) ORDERED FORWARD)
                                   |--Clustered Index Seek(OBJECT:([myDB].[dbo].[myTable].[PK__myTable__2334397B]), SEEK:([myDB].[dbo].[myTable].[myAutoInc]=[myDB].[dbo].[myTable].[myAutoInc]) LOOKUP ORDERED FORWARD)

Table 'myTable'. Scan count 0, logical reads 66988, physical reads 48, read-ahead reads 2515, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '##test'. Scan count 5, logical reads 23, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
+3  A: 

A GUID is a poor choice of clustered index, simply because it's so big. Using an INTEGER field allows the database to pack much much more information into a 'page', so fewer pages need to be fetched from disk for any given query.

Also note that the cluster key(s) are stored in every non-clustered index (since that's what's used to locate the data), which compounds the problem.

Gary McGill
@GUID PK: Isn't it a sorting issue? Primary Keys are clustered indexes by default. Guids are not growing. So you should have a fragmented clustered index.
Arthur
@Arthur: you're right that clustered indexes can allow more efficient sorting (and range filtering), but in this case since there's no sorting going on I don't think it'd make a difference. SUM() doesn't need sorted data.
Gary McGill
@Sum():Yes, but the where clause would benefit from a clustered index.
Arthur
@Arthur: I'm not sure why you think that this particular WHERE clause would benefit from the PK being sorted? Unless the GUIDs in ##test happen to be in a contiguous range (so that the sorting means all the hits are from the same page), I wouldn't expect there to be any benefit.
Gary McGill
OK - it's late here in Klosterneuburg. You are totally right. What I wanted to point out, that when you create a primary key (clustered index = sorted by default) and your PK column is a guid then you should get fragmented pages. the col used in the where clause should be index - but that's another discussion. Sorry - I didn't get to the point. It has nothing to do with sorting but with page fragmentation.
Arthur
@Gary: Your idea makes sense. However, I'm still hesitant to accept that the key size can make *that much* of a difference...
Heinzi
@Heinzi: Yes; while I stand by my advice not to use a GUID as a cluster key, I think Remus' answer is more relevant to your problem.
Gary McGill
+3  A: 

Looking at the plan in the 'slow' case it shows that the query does a seek on the index [myDB].[dbo].[myTable].[myGuid] followed by a clustered index seek on [myDB].[dbo].[myTable].[PK__myTable__2334397B]. This only makes sense if you have created both a non-clustered index on myTable(myGuid) and also declared myTable(myGuid) as clustered index key (it appears so, judging from the typical 'PRIMARY KEY' declaration auto-generated name naming convention of the clustered index object 'PK_...').

Other than that, the plans are very similar and they're both quite bad in that they include a SORT. The difference in width of the autoInc column vs. the GUID involved in the potential larger width of the non-clustered index in the first case may explain the difference, but I doubt is the full story.

Please redo the test making sure that you have a clustered key on myGuid and that you do NOT have also a non-clustered index on the same key. The plan should include only one single seek on myTable, using the clustered index, to compare exactly the cases you wanted to compare.

Also, obviously, make sure you compare the same ##test content and the buffer pool cache is warmed up in both case identically. Run DBCC FREESYSTEMCACHE('All') before each test then run the query at least 5 times, negleting the first run (it will be the run that warms up the buffer pool).

Also, as Arthur already noted, having an order guarantee on ##test (ie. a clustered key) could speed up things as the nested loops can be replaced with a merge join, if ##test content is large enough. If the ##temp has only few rows, then the nested loop is better and order makes no difference.

Remus Rusanu
Thanks for noticing the extra index; I have removed it and updated the query plan. Unfortunately, no change in performance. Thanks also for the `FREESYSTEMCACHE` hint (still no change, though).
Heinzi
@Heinzi: can you post the new plans? Also, pleas make sure you post the **actual** execution plan, not the estimate. Afaik `SET SHOWPLAN_TEXT` shows the *estimate* plan. Please use `SET STATISTICS IO ON` instead, is more relevant. Also, if possible post the actual execution plan XML (captured from SSMS 'show execution plan' or from Profiler), just make sure is actual not estimate.
Remus Rusanu
@Remus: Thanks a lot for looking into this. I've added the `STATISTICS IO` output and links to the `STATISTICS XML` output -- you should be able to open the latter with SSMS.
Heinzi
The 'slow' one has 38k logical reads, a simpler plan and the .sqlplan file shows a compile time of 59ms. The 'fast' plan by comparison has 67k logical reads, a more complex plan and a compile time of 284ms. There is one critical difference though: the 'slow' plan has degree of parallelism 1, the 'fast' one has DOP **4**. So although the GUID index makes a better plan, it does not get paralelized. I'm not sure why yet.
Remus Rusanu
I see. Thanks for the analysis; I'm quite surprised that the parallelism makes that much of a difference (in particular, since I usually expect the HDDs or the RAM to be the bottleneck rather than the CPU). I guess I'll have to read up on SQL Server's Parallel Query Processing...
Heinzi