views:

208

answers:

2

I have a table in Sql Server 2008 Express which contains 18 million records. The structure looks something like this (simplified):

Id, GroupId, Value, Created

Id is the primary key with a clustered index
GroupId is a non-clustered index

In this case, every 10 rows get a new groupId meaning that records 1-10 have GroupId 1, records 11-20 have GroupId 2 and so on.

Test 1: This query takes 23 seconds to run and returns 99 records:

DECLARE @Start INT
SET @Start = 1050
select * from FieldValues where GroupId between @Start and @Start + 10

Test 2: This query takes 0 seconds to run and returns 99 records:

DECLARE @Start INT
SET @Start = 1050
select * from FieldValues where GroupId = @Start union
select * from FieldValues where GroupId = @Start + 1 union
select * from FieldValues where GroupId = @Start + 2 union
select * from FieldValues where GroupId = @Start + 3 union
select * from FieldValues where GroupId = @Start + 4 union
select * from FieldValues where GroupId = @Start + 5 union
select * from FieldValues where GroupId = @Start + 6 union
select * from FieldValues where GroupId = @Start + 7 union
select * from FieldValues where GroupId = @Start + 8 union
select * from FieldValues where GroupId = @Start + 9 union
select * from FieldValues where GroupId = @Start + 10


Note: Since results can get cached i always scramble the @Start variable between each test to get non-cached time estimations

Why does these multiple selects (which looks like some beginner have throught up) go so much faster than the more elegant one in test 1?

+6  A: 

Try using the "Show actual execution plan" in the query analyser and you will see that the second query is probably achieving the results by performing an index seek, whereas the former (slower) is not able to do this because it doesn't know that the records are sequential because the index it is using is non-clustered.

Bernhard Hofmann
Very interesting, you seem to be correct. The only time i would need this is when i display a paginated griview with 10 records, otherwise i only look for a single GroupId to display its data. Would you suggest i keep the GroupId as nonclustered or modify the table structure in some other way?
CodeSpeaker
It's VERY difficult to say what indexes you should have/keep without profiling your application. I can only suggest you optimise for the most common cases, and try find acceptable solutions to the less common cases. It seems your not-so-pretty collection of UNIONS might be acceptable. But I am outside your environment and application domain - my advice could be completely wrong on this point.
Bernhard Hofmann
JUst rememebr if you change the group id index to clustered, you may slow down inserts. All of these sorts of database changes need to be tested as what improves performance in one area may degredate it in another. Only you can can say what will work best in your particular environment.
HLGEM
Thanks for your input. I am currently testing different approacehes to see what fits my needs best. I got around the ugly multiple selects by using select * from FieldValues where GroupId in (1, 2, 3...) etc instead which executes just as fast
CodeSpeaker
A: 

Since those appear to be mutually exclusive statements in the unions, I would suggest that union all is a better choice than union. That will create less work for the server.

HLGEM