views:

97

answers:

2

In my application I have two queries which will be quite frequently used. The Where clauses of these queries are the following:

WHERE FieldA = @P1 AND (FieldB = @P2 OR FieldC = @P2)

and

WHERE FieldA = @P1 AND FieldB = @P2

P1 and P2 are parameters entered in the UI or coming from external datasources.

  • FieldA is an int and highly non-unique, means: only two, three, four different values in a table with say 20000 rows
  • FieldB is a varchar(20) and is "almost" unique, there will be only very few rows where FieldB might have the same value
  • FieldC is a varchar(15) and also highly distinct, but not as much as FieldB
  • FieldA and FieldB together are unique (but do not form my primary key, which is a simple auto-incrementing identity column with a clustered index)

I'm wondering now what's the best way to define an index to speed up specifically these two queries. Shall I define one index with...

FieldB (or better FieldC here?)
FieldC (or better FieldB here?)
FieldA

... or better two indices:

FieldB
FieldA

and

FieldC
FieldA

Or are there even other and better options? What's the best way and why?

Thank you for suggestions in advance!

Edit:

Just as an info to other readers: Here was another answer which has been deleted now. Actually the answer seemed very useful to me. The recommendation was to create two indices (according to my second option above) and to reformulate the first query by using a UNION of two select statements (one with WHERE FieldA = @P1 AND FieldB = @P2 and one with WHERE FieldA = @P1 AND FieldC = @P2) instead of OR to benefit from both indices (which wouldn't be the case with the OR-operator).

Edit2:

The statement that with OR the indexes are not used and that a UNION is preferable seems to be wrong - at least according to my own tests (see my own answer below).

+3  A: 

Extending Remus' (edit: now deleted) answer...

  • if @p2 is varchar(15) then you can't compare against FieldB reliably, it's varchar(20)
  • if @p2 is varchar(20) then FieldC will be converted to varchar(20) and not use an index (or at best scan it)
  • if @p1 only has 2, 3, 4 values then why not tinyint and reduce table/index size?

I wouldn't bother with indexes until you resolve this datatype precedence issue: this is on top of the OR clause issue.

Finally, a column is unique or non-unique: there is no in between. Statistics help here with selectivity, but it's irrelevant.

I would reverse the indexes from Remus' answer to be FieldB, FieldA (and unique) and FieldC, FieldA because of FieldA's selectivity

Edit, after comments: you can't compare the use of @p2 against the use of constant strings.

gbn
With "almost unique" I meant actually "has high selectivity" (sorry, didn't find the correct term). But I have problems to understand your first two points: 1) Having a where clause like `where FieldB='1234567890'` what do you mean when you say that I "can't compare against FieldB reliable"? 2) Having a query like `where FieldC='12345678901234567890'` why can't the index be used? Doesn't the index on a varchar(15) column know that the result set must be empty since my input parameter is longer than any value in the column can be at all?
Slauma
See datatype precedence: http://msdn.microsoft.com/en-us/library/ms190309.aspx. Data types must match or implicit conversions occur. varchar(20) <> varchar(15). Also see conversions here http://msdn.microsoft.com/en-us/library/ms187928.aspx.
gbn
I believe that datatypes must match and implicit conversions happen. But that's not the point. What is worrying me is that your first two statements seem to imply that either an index doesn't work "reliable" (whatever this means) or isn't used at all if the length of my parameter doesn't match exactly the field length. I have used `varchar(x)` fields and not `char(x)` since the length of the values in the columns can be different. If a user enters a string of 1 character length or of 25 character length I'd expect that in all cases the indexes are used. Isn't that the case?
Slauma
I've created the indexes in the meantime and tested the queries with strings of different length (also longer than the field length): Looking at the query execution plan I cannot see a difference at all. No matter how long the string is, the indexes are used ("Index seek").
Slauma
@slauma: please show the queries you're using *exactly*... then read this http://stackoverflow.com/questions/1211287 Note the use of variables vs constants in the query plan. Are you using constants in your queries that demonstrate I'm wrong? And this http://stackoverflow.com/questions/1392307
gbn
@gbn: Aaah, now I see the problem! I have actually tested with constants not expecting that it would make a difference to parameters. Yes, when I use parameters, the index isn't used. (I'm testing in SQL Server Management Studio.) But currently I have the problem that the index is even not used when I match the datatype: It seems to depend on the field in my select clause; the index is only used when I select one of the index fields, but not if I select another field or *. I'll open a new question tomorrow. Thanks so far, I've learned a lot, especially how much I still don't understand ;)
Slauma
@gbn: I believe that much of the confusion was due to a too small db I tested. Possibly SQLServer decides not to use an index when the table only has a few records? In the meantime I've created a larger test db and the index is always used now, interestingly also when the length of my parameters does not meet the database field length, which is not expected according to your answers here and in the 2 other questions you linked in one of your comments. Your answers there refer to SQLServer 2000, is it possible that this behaviour changed in SQL 2008? Take a look at my own answer.
Slauma
Slauma: behaviour maybe did change. I have seen length mess things uo
gbn
A: 

I add my own answer after some tests with a larger database (in SQL Server 2008):

First, I have decided for the second option, means, I have created two indexes:

CREATE UNIQUE NONCLUSTERED INDEX [IX_B] ON [dbo].[MyTable] 
(
    [FieldB] ASC,
    [FieldA] ASC
)
CREATE NONCLUSTERED INDEX [IX_C] ON [dbo].[MyTable] 
(
    [FieldC] ASC,
    [FieldA] ASC
)

I've tested then two queries:

declare @p1 int = 1;
declare @p2 varchar(20) = '12345678';

select * from MyTable
where FieldA=@p1 and (FieldB=@p2 or FieldC=@p2);

Performing this query I get the following query plan (ID is the primary key of the table, PK_MyTable the clustered index on the primary key):

|--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
   |--Stream Aggregate(GROUP BY:([MyDb].[dbo].[MyTable].[ID]) DEFINE:([MyDb].[dbo].[MyTable].[FieldA]=ANY([MyDb].[dbo].[MyTable].[FieldA])))
   |  |--Merge Join(Concatenation)
   |     |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_B]), SEEK:([MyDb].[dbo].[MyTable].[FieldB]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
   |     |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_C]), SEEK:([MyDb].[dbo].[MyTable].[FieldC]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
   |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)

So it seems both indexes are used ("Index Seek").

Elapsed time for the query: 00:00:00.2220127

The second query I tested was using a JOIN to avoid the OR operator (see the "Edit" in my question):

declare @p1 int = 1;
declare @p2 varchar(20) = '12345678';

select * from MyTable where FieldA=@p1 and FieldB=@p2
union
select * from MyTable where FieldA=@p1 and FieldC=@p2;

This query has the following query plan:

|--Merge Join(Union)
   |--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
   |  |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_B]), SEEK:([MyDb].[dbo].[MyTable].[FieldB]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
   |  |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)
   |--Nested Loops(Inner Join, OUTER REFERENCES:([MyDb].[dbo].[MyTable].[ID]))
      |--Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[IX_C]), SEEK:([MyDb].[dbo].[MyTable].[FieldC]=[@p2] AND [MyDb].[dbo].[MyTable].[FieldA]=[@p1]) ORDERED FORWARD)
      |--Clustered Index Seek(OBJECT:([MyDb].[dbo].[MyTable].[PK_MyTable]), SEEK:([MyDb].[dbo].[MyTable].[ID]=[MyDb].[dbo].[MyTable].[ID]) LOOKUP ORDERED FORWARD)

Again both indexes are used ("Index seek").

Elapsed time for the query: 00:00:00.3710212

Note: For both queries it doesn't matter which length I declare @p2 with: Using varchar(8) or varchar(20) or varchar(30) gives the same results and query plans.

Following these results I will stay with using the OR operator instead of the UNION, since both queries use the indexes but the first one is faster.

Slauma