tags:

views:

125

answers:

5

I have a parent table and child table where the columns that join them together are the UNIQUEIDENTIFIER type.

The child table has a clustered index on the column that joins it to the parent table (its PK, which is also clustered).

I have created a copy of both of these tables but changed the relationship columns to be INTs instead, have rebuilt the indexes so that they are essentially the same structure and can be queried in the same way.

When I query for a known 20 records from the parent table, pulling in all the related records from the child tables, I get identical query costs across both, i.e. 50/50 cost for the batches.

If this is true, then my giant project to change all of the tables like this appears to be pointless, other than speeding up inserts. Can anyone provide any light on the situation?


EDIT:

The question is not about which is more efficient, but why is the query execution plan showing both queries as having the same cost?

+4  A: 

Much more efficient.

Int is much smaller. This means you get much smaller indices, which means you get much better memory use and load time for index access. It depends a lot, though, on how large your tables are and what you do with them.

TomTom
Doesn't really answer the question, I want to know why the execution doesn't support this theory.
ck
It does. You dont really describe any execution. "When I query for a known 20 records from the parent table". That is ridiculous slmall - 20 records, what you expect to see? How large is the parent table?
TomTom
@TomTom - fair enough, I'll try and bump up the volumes. I was jsut hoping the execution plan would show that the theory of using ints would mean less work, even with smaller volumes. Thanks for the info.
ck
A: 

Definitely int is efficient than Uniqueidentifier (GUID).

  1. GUID is combination of uppercase chars and numeric values

  2. Size increases as table size increases. e.g. very large records > 10000

  3. GUID is not optimized for order by and group by.

  4. GUID, i.e. unique value. can be hinderance for performance

Edit

after more explaination in question, answer is not valid anymore, actually it concentrate only on efficiency :)

uniqueidentifier explained at MSDN

http://msdn.microsoft.com/en-us/library/ms187942.aspx

Saar
Oh god. GUID is no tuppercase cahrs and numerric values. GUID is a 128 bit number. It gets parsed once, then binray compared.
TomTom
@TomTom: may be reading info at added link will help you ;)
Saar
@Saar: perhaps you should read that link a bit better. The GUID when represented as a string will only be parsed once. Then it will be referenced as the 128bit number that it is. If it is already stored as a UNIQUEIDENTIFIER type in the DB there will be no parse step at all, and will be referenced as the 128bit number.
Brian Rudolph
Brian Rudolph
Also, what ever do you mean by the size increases as table size increases? The size of a GUID will always be 128bits.
Brian Rudolph
A: 

I suspect that if your filter to 20 rows is, say, on a datetime column or status column, then the remaining 20 rows lookup is irrelevant if it's int or GUID

I'd like to see an XML query plan please.

gbn
The 20 rows were from a static list of known data points in another column. I've bumped it up to 2000, but am still seeing the same results. It's probably because my dev database is only 650k records.
ck
+4  A: 

Seek-in a key in a clustered index is basically the same on a 4 bytes key, a 16 bytes key, or 160 bytes key. The cost of comparing the slots with the predicate is just noise in the overall cost of query (execution preparation, preparing execution context, opening the rowsets, locating the pages etc), even when no IO is involved.

While no one will argue that GUIDs and INT are on equal footing, comparing just 20 seeks will not reveal the differences. One thing you can measure immediately is space: a saving of 12 bytes per row and per non-leaf page on clustered index, plus 12 bytes on every leaf page on non-clustered indexes will add up over millions of rows and tens of tables and indexes. Less space means less IO, better memory cache performance, better goodness overall, and that can be measured, but you need to measure real loads, not a puny 20 rows seek.

Under lab conditions you will be able to measure the difference in raw speed between seeking an INT or a GUID, but that shouldn't be your focus. The argument of INT vs. GUID is not drivan by something like 5% performance gain in a seek, is driven by space savings and by guid randomness leading to fragmentation, both very easy to measure metrics that make a solid case for INT on their own grounds, no need to bring in the seek performance argument.

Remus Rusanu
Thanks, that covers a lot of useful information. I've been able to estimate the space savings, but in putting together a business case I need to show how the performance will improve, as disk and even memory are relatively cheap. I was hoping to show the execution of one query against the other would indicate x% improvement.
ck
+1  A: 

On top of what Remus said, using GUID for clustered indexes is going to lead to tremendous fragmentation of them in most cases, affecting the performance of the queries in terms of IO. This happens when you don't use sequentially generated guids which I suppose is mostly the case when an application generates guid outside of database. To create sequential guid ('bigger' than the previously generated in database) you have to use function newsequentialid()

Comparison of cost of two plans in one batch is not accurate in all cases. The cost is estimated amongst others on number of IO operations needed to execute the query. In small databases, difference between INT and GUID will not change IO significantly enough to show the difference in execution plans.

Piotr Rodak