views:

145

answers:

6

Hi folks!

Can somebody give a hint on this one? :

I have a table, let's say tblA, where I have id1 and id2 as columns and index(id1,id2). I want to select the id1´s where id2´s belong to several sets. So I would want to say

select id1 from tblA 
where id2 in (val1,val2,val3 ...)
union
select id1 from tblA 
where id2 in (val4,val2,val3 ...)
union
(...)*

Let's say we have in table A the following:

(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(2,1)
(2,2)
(2,3)

Now I want all the id1s that have id2 in (3,4).

So what I want to get is id1 = 1.

2 shouldn't appear because although we have a relation (2,3) we don't have (2,4).

Any ideas how to perform this query? I guess the way above has a problem with performance if the (...) grows to much!? Thanks.

greets

A: 

The union is gonna kill your performance. Use something like this:

select id1 from tblA where id2 in (val1,val2,val3 ...) or id2 in (val4,val2,val3)
Mark Roddy
Did you mean you want all id1 values for which id2 is in each of these subsets (this seems to be indicated by your wording, but the example query won't accomplish as such). If this is the case all you have to do is change the 'or' in the where clause to an 'and'.
Mark Roddy
please check my comment above, it was not that that i meant
A: 

Can you combine all the sets into one large set?

If the order is not important, this would seem to be the fastest way.

Cameron
A: 

First, remember that

select id1 from tblA where id2 in (val1, val2, val3) union
select id1 from tblA where id2 in (val4, val5, val6)

should give the same result as

select id1 from tblA where id2 in (val1, val2, val3, val4, val5, val6)

so you can perhaps improve efficiency by formulating a single query rather than using a union.

Secondly (and independent of the above) you should add an index on id2 to tblA. Without it the id2 values are randomly distributed through both the existing index and the table data, so the optimizer will have no option but to perform a linear scan - of the index, if you are lucky.

A: 

But all these queries give back both ids from column id1! I think Robert meant that as a result he just wants "1" from column id1:

   id1 id2
    1 | 1
    1 | 2
    1 | 3
    1 | 4  -->  id1s that have id2 with 3 and 4
    1 | 5
    2 | 1
    2 | 2
    2 | 3

Because id1=2 does not have 3 AND 4 it should not be a result.

Please correct me if I misunderstood... I was trying to do a statement but I could not get just the id1=1 back, but I am as well very interested in an efficient solution to this!

Dimitri Wetzel
A: 

You need to create a separate index on column 'id2' because combined index on (id1,id2) will not be used when looking up for id2 only.

This query does what you mentioned

SELECT id1 FROM tblA WHERE id2 IN (?,?,?,?)
GROUP BY id1 HAVING COUNT(id2)=4

NOTE: You need to adjust the COUNT(id2) condition in HAVING clause to the number of values mentioned in the IN clause. Here i used four '?' to represent four values that's why i have written COUNT(id2)=4.

For the scenario which you mentioned in the comment, query will look like following

SELECT id1 FROM tblA WHERE id2 IN (3,4)
GROUP BY id1 HAVING COUNT(id2)=2
Babar
+1  A: 

You should create a temporary table like this:

CREATE TABLE temp (id INT NOT NULL PRIMARY KEY) ENGINE MEMORY;

, fill it with values you are searching for (2 and 3 in your example):

INSERT
INTO    temp
VALUES  (3), (4)

and issue this query:

SELECT  ad.id1
FROM    (
        SELECT  DISTINCT id1
        FROM    a
        ) ad
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    temp
        WHERE   NOT EXISTS
                (
                SELECT  NULL
                FROM    a
                WHERE   a.id1 = ad.id1
                        AND a.id2 = temp.id
                )
        )

You should create a composite index on (id1, id2) for this to work.

For each id1, this will probe each id2 against temp at most once, and will return false as soon as the first id2 absent in temp is found for each id1.

Here's the plan for the query:

1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 2, 'Using where'
3, 'DEPENDENT SUBQUERY', 'temp', 'ALL', '', '', '', '', 2, 'Using where'
4, 'DEPENDENT SUBQUERY', 'a', 'eq_ref', 'PRIMARY', 'PRIMARY', '8', 'ad.id1,test.temp.id', 1, 'Using index'
2, 'DERIVED', 'a', 'range', '', 'PRIMARY', '4', '', 3, 'Using index for group-by'

, no temporary, no filesort.

Quassnoi