views:

78

answers:

3

I have a table with some data in it:

 ColA | ColB | ColC
------+------+------
  1   |  A   |  X
  2   |  A   |  Y
  3   |  B   |  Y
  4   |  C   |  Y
  5   |  C   |  Z
  6   |  D   |  Y
  7   |  D   |  Z

I want to query to get all of the rows where ColB and ColC as a pair match a condition:

SELECT * FROM [Table] 
WHERE (ColB = A AND ColC = Y)
OR (ColB = B AND ColC = Y)
OR (ColB = C AND ColC = Y)
OR (ColB = D AND ColC = Z)

this should return rows 2, 3, 4 and 7.


The pairs of values for (ColB, ColC) could potentially be large (in the region of ~100). Is there a more efficient way of querying for this data other than a large query with lots of OR conditions?

I am hoping there is a way of using an equivalent of a Tuple, meaning I can do something like:

SELECT * FROM [Table] 
WHERE (ColB, ColC) IN ({A, Y}, {B, Y}, {C, Y}, {D, Z})

Any ideas?


EDIT: (to answer some questions in the comments)

The fields for ColB and ColC store guids and are declared as uniqueidentifier types.
This needs to work on SQL Server 2005 upwards (all editions).
The table has in the order of millions of rows, and I'm not averse to adding any indices that are required for this to work.

+4  A: 

You could do this. (If you are on SQL Server 2008 you could use the values row constructors rather than union alls)

You'd need to check the query plan to see if it was any more efficient.

SELECT * /*But don't use star*/
FROM [Table] 
JOIN 
(
SELECT 'A' AS BMatch, 'Y' AS CMatch UNION ALL
SELECT 'B' AS BMatch, 'Y' AS CMatch UNION ALL
SELECT 'C' AS BMatch, 'Y' AS CMatch UNION ALL
SELECT 'D' AS BMatch, 'Z' AS CMatch UNION ALL ...
) Matches
ON ColB = BMatch AND ColC = CMatch

You say in the comments that the combination of ColB and ColC is unique so (assuming your table already has a clustered index) I would imagine if you create a unique nonclustered index on either (colb, colc) or (colc, colb) that the above should give you a plan with 100 index seeks followed by 100 bookmark lookups. If it doesn't you could try adding an index hint to get it to use the new index. You'd need to compare the I/O for that with the I/O of a full scan as lots of ors would likely give you.

The cost of the bookmark lookups could potentially be avoided by including the additional required columns in the non clustered index. You've used * though so I don't know how viable this would be. You'd need to balance the benefit to this query against possible disbenefits to data modification operations.

Martin Smith
This looked the most promising, but when compared to the `OR` solution in my original question didn't fare as well in terms of execution plan/time for the data I have. I've ended up using the `OR` solution.
adrianbanks
+1  A: 

Why not create a new table to filter your result-set? With around 100 values to filter by, this solution would be more flexible should your filter conditions change in the future i.e. you simply change the filter table rather than changing the "where" clause that would be embedded in your select statement: -

if exists (select * from sys.objects WHERE object_id = object_id(N'dbo.Filter') AND type in (N'U'))
drop table dbo.Filter

create table dbo.Filter (
ColB char(1) not null,
ColC char(1) not null,
constraint pkFilter primary key clustered (ColB,ColC) with fillfactor = 100)

insert into dbo.Filter (ColB,ColC) values ('A','Y'),('B','Y'),('C','Y'),('D','Z')

select * from [Table] as t
inner join dbo.Filter as f on t.ColB = f.ColB and t.ColC = f.ColC

Andy Jones
+3  A: 

Have you considered adding a calculated column as a concatenation of both columns?

It would simplify your select statement and allows for an index to be added.

CREATE TABLE [dbo].[Table] (
  ColA  INTEGER
  , ColB VARCHAR(1)
  , ColC VARCHAR(1)
  , ColBC AS ColB + ColC
)  

CREATE UNIQUE INDEX IX_TABLE_COLBC ON [dbo].[Table] (ColBC)

INSERT INTO [Table] VALUES(1, 'A', 'X')
INSERT INTO [Table] VALUES(2, 'A', 'Y')
INSERT INTO [Table] VALUES(3, 'B', 'Y')
INSERT INTO [Table] VALUES(4, 'C', 'Y')
INSERT INTO [Table] VALUES(5, 'C', 'Z')
INSERT INTO [Table] VALUES(6, 'D', 'Y')
INSERT INTO [Table] VALUES(7, 'D', 'Z')

Not exactly the equivalent of a Tuple, but it does allow you to change your select to

SELECT * FROM [Table] 
WHERE (ColBC) IN ('AY', 'BY', 'CY', 'DZ')

and it uses the index.

Lieven