ansaurus

Question

Creating an appropriate index for a frequently used query in SQL Server

Answer 1

+3 A:

Extending Remus' (edit: now deleted) answer...

if @p2 is varchar(15) then you can't compare against FieldB reliably, it's varchar(20)
if @p2 is varchar(20) then FieldC will be converted to varchar(20) and not use an index (or at best scan it)
if @p1 only has 2, 3, 4 values then why not tinyint and reduce table/index size?

I wouldn't bother with indexes until you resolve this datatype precedence issue: this is on top of the OR clause issue.

Finally, a column is unique or non-unique: there is no in between. Statistics help here with selectivity, but it's irrelevant.

I would reverse the indexes from Remus' answer to be FieldB, FieldA (and unique) and FieldC, FieldA because of FieldA's selectivity

Edit, after comments: you can't compare the use of @p2 against the use of constant strings.

gbn 2010-06-13 18:58:35

With "almost unique" I meant actually "has high selectivity" (sorry, didn't find the correct term). But I have problems to understand your first two points: 1) Having a where clause like `where FieldB='1234567890'` what do you mean when you say that I "can't compare against FieldB reliable"? 2) Having a query like `where FieldC='12345678901234567890'` why can't the index be used? Doesn't the index on a varchar(15) column know that the result set must be empty since my input parameter is longer than any value in the column can be at all?

Slauma 2010-06-13 19:21:43

See datatype precedence: http://msdn.microsoft.com/en-us/library/ms190309.aspx. Data types must match or implicit conversions occur. varchar(20) <> varchar(15). Also see conversions here http://msdn.microsoft.com/en-us/library/ms187928.aspx.

gbn 2010-06-13 19:32:24

I believe that datatypes must match and implicit conversions happen. But that's not the point. What is worrying me is that your first two statements seem to imply that either an index doesn't work "reliable" (whatever this means) or isn't used at all if the length of my parameter doesn't match exactly the field length. I have used `varchar(x)` fields and not `char(x)` since the length of the values in the columns can be different. If a user enters a string of 1 character length or of 25 character length I'd expect that in all cases the indexes are used. Isn't that the case?

Slauma 2010-06-13 20:10:10

I've created the indexes in the meantime and tested the queries with strings of different length (also longer than the field length): Looking at the query execution plan I cannot see a difference at all. No matter how long the string is, the indexes are used ("Index seek").

Slauma 2010-06-13 20:13:41

@slauma: please show the queries you're using *exactly*... then read this http://stackoverflow.com/questions/1211287 Note the use of variables vs constants in the query plan. Are you using constants in your queries that demonstrate I'm wrong? And this http://stackoverflow.com/questions/1392307

gbn 2010-06-13 20:14:46

@gbn: Aaah, now I see the problem! I have actually tested with constants not expecting that it would make a difference to parameters. Yes, when I use parameters, the index isn't used. (I'm testing in SQL Server Management Studio.) But currently I have the problem that the index is even not used when I match the datatype: It seems to depend on the field in my select clause; the index is only used when I select one of the index fields, but not if I select another field or *. I'll open a new question tomorrow. Thanks so far, I've learned a lot, especially how much I still don't understand ;)

Slauma 2010-06-13 21:17:05

@gbn: I believe that much of the confusion was due to a too small db I tested. Possibly SQLServer decides not to use an index when the table only has a few records? In the meantime I've created a larger test db and the index is always used now, interestingly also when the length of my parameters does not meet the database field length, which is not expected according to your answers here and in the 2 other questions you linked in one of your comments. Your answers there refer to SQLServer 2000, is it possible that this behaviour changed in SQL 2008? Take a look at my own answer.

Slauma 2010-06-29 17:50:38

Slauma: behaviour maybe did change. I have seen length mess things uo

gbn 2010-06-29 18:19:05

Answer 2

A:

I add my own answer after some tests with a larger database (in SQL Server 2008):

First, I have decided for the second option, means, I have created two indexes:

CREATE UNIQUE NONCLUSTERED INDEX [IX_B] ON [dbo].[MyTable] 
(
    [FieldB] ASC,
    [FieldA] ASC
)
CREATE NONCLUSTERED INDEX [IX_C] ON [dbo].[MyTable] 
(
    [FieldC] ASC,
    [FieldA] ASC
)

I've tested then two queries:

declare @p1 int = 1;
declare @p2 varchar(20) = '12345678';

select * from MyTable
where FieldA=@p1 and (FieldB=@p2 or FieldC=@p2);

Performing this query I get the following query plan (ID is the primary key of the table, PK_MyTable the clustered index on the primary key):

|--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
   |--Stream Aggregate(GROUP BY:([MyDb].[dbo].[MyTable].[ID]) DEFINE:([MyDb].[dbo].[MyTable].[FieldA]=ANY([MyDb].[dbo].[MyTable].[FieldA])))
   |  |--Merge Join(Concatenation)
   |     |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_B]), SEEK:([MyDb].[dbo].[MyTable].[FieldB]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
   |     |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_C]), SEEK:([MyDb].[dbo].[MyTable].[FieldC]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
   |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)

So it seems both indexes are used ("Index Seek").

Elapsed time for the query: 00:00:00.2220127

The second query I tested was using a JOIN to avoid the OR operator (see the "Edit" in my question):

declare @p1 int = 1;
declare @p2 varchar(20) = '12345678';

select * from MyTable where FieldA=@p1 and FieldB=@p2
union
select * from MyTable where FieldA=@p1 and FieldC=@p2;

This query has the following query plan:

|--Merge Join(Union)
   |--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
   |  |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_B]), SEEK:([MyDb].[dbo].[MyTable].[FieldB]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
   |  |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)
   |--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
      |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_C]), SEEK:([MyDb].[dbo].[MyTable].[FieldC]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
      |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)

Again both indexes are used ("Index seek").

Elapsed time for the query: 00:00:00.3710212

Note: For both queries it doesn't matter which length I declare @p2 with: Using varchar(8) or varchar(20) or varchar(30) gives the same results and query plans.

Following these results I will stay with using the OR operator instead of the UNION, since both queries use the indexes but the first one is faster.

Slauma 2010-06-29 17:44:02

ansaurus

tags:

views:

answers:

Creating an appropriate index for a frequently used query in SQL Server

related questions