views:

96

answers:

2

I'm trying to filter a relationship table down to get a subset of the table where two conditions are met (ie: I want all of the id's of the entries who's color_ids are 1 or 2). It's a beefy table, so I'm trying to optimize as much as possible.

I was wondering if anyone could explain my finding in this case:

Why is

SELECT DISTINCT a.id 
  FROM RelationshipTable as a 
  JOIN RelationshipTable as b ON b.id = a.id 
 WHERE a.color_id = 1 
   AND b.color_id = 2;

faster than

SELECT DISTINCT id 
  FROM RelationshipTable 
 WHERE color_id = 1 
    OR color_id = 2;

in MySql 4.1?

+2  A: 

The first query is impossible and will never return a result set. It's basically saying "Give me all the records in the table where color_id is 1 AND color_id is 2" which can never happen.

If you want to ask the difference between

SELECT DISTINCT a.id 
  FROM RelationshipTable as a 
  JOIN RelationshipTable as b ON b.id = a.id 
 WHERE a.color_id = 1 
   OR b.color_id = 2;

versus

SELECT DISTINCT color_id 
  FROM RelationshipTable 
 WHERE color_id = 1 
    OR color_id = 2;

In this case the first will always be slower than the second for large tables. The first results in a full table scan for table A while the second one uses the indexes that should be used in the where clause.

methodin
This is not correct. I neglected to mention that id is NOT a unique key. Therefore I'm going to get all iterations of the combinations of duplicate ids, making a table where color_id's being non-identical possible.
Matt Dunnam
+1  A: 

The two are not the same query and should not be giving the same result set. In the first query you want all the records which meet both conditions, you have a record with a color_id = of 1 and a record with a color_id of 2 for the same ID. In the second query you will get all records that have both color ids and all records that have only one or the other. Of course since you are asking for a differnt field to be returned you might not see this. And the second query is somewhat silly anyway as it can be expressed as:

select 1 as color id 
union all
select 2

And never hit a table at all. That would make it super fast.

HLGEM