views:

125

answers:

7

I have a particularly slow query due to the vast amount of information being joined together. However I needed to add a where clause in the shape of id in (select id from table).

I want to know if there is any gain from the following, and more pressing, will it even give the desired results.

select a.* from a where a.id in (select id from b where b.id = a.id)

as an alternative to:

select a.* from a where a.id in (select id from b)

Update: MySQL Can't be more specific sorry table a is effectively a join between 7 different tables. use of * is for examples

Edit, b doesn't get selected

A: 

YMMV, but I've often found using EXISTS instead of IN makes queries run faster.

SELECT a.* FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id = a.id)

Of course, without seeing the rest of the query and the context, this may not make the query any faster.

JOINing may be a more preferable option, but if a.id appears more than once in the id column of b, you would have to throw a DISTINCT in there, and you more than likely go backwards in terms of optimization.

dpmattingly
What makes you think this will run faster?
Andomar
Distinct would be required
jim
A: 

I would never use a subquery like this. A join would be much faster.

select a.*
from a 
join b on a.id = b.id

Of course don't use select * either (especially never use it when doing a join as at least one field is repeated) and it wastes network resources to send unnneeded data.

HLGEM
What makes you think this will run faster?
Andomar
Join would be faster, @ * use, example was simplified.
jim
Execution plan is the same and this example presents a potential duplicate record depending on structure of b.
Jeff O
yes, it is possible b can contain dup a refs
jim
A: 

Have you looked at the execution plan?

How about

select a.* 
from a 
inner join b
on a.id = b.id

presumably the id fields are primary keys?

Russ Cam
id's are keys/indexed
jim
+2  A: 

Both queries you list are the equivalent of:

select a.* 
from a 
inner join b on b.id = a.id

Almost all optimizers will execute them in the same way.

You could post a real execution plan, and someone here might give you a way to speed it up. It helps if you specify what database server you are using.

Andomar
mysql innodb, can't post execution plan sorry.
jim
the title says sql (any) because its a theoretical question,
jim
Lou indicated that b could have duplicate ID's, so a join would not produce the same results.
Jeff O
@Guiness: You're right
Andomar
A: 
Select a.* from a
inner join (Select distinct id from b) c
on a.ID = c.AssetID

I tried all 3 versions and they ran about the same. The execution plan was the same (inner join, IN (with and without where clause in subquery), Exists)

Since you are not selecting any other fields from B, I prefer to use the Where IN(Select...) Anyone would look at the query and know what you are trying to do (Only show in a if in b.).

Jeff O
My explain gives simple select types on everything for the join, and for the 'in select' it gives primarys on all of a tables then dependent sub query. Join seems to be the faster option even using distinct.
jim
A: 

your problem is most likely in the seven tables within "a"

make the FROM table contain the "a.id" make the next join: inner join b on a.id = b.id

then join in the other six tables.

you really need to show the entire query, list all indexes, and approximate row counts of each table if you want real help

KM
No see, I am only asking about the difference between the two shown in my question. the 'a' part isn't broken indexes are correctly used. It isn't slow in the sense the query could be optimized more its slow because its a huge result set. Listing out db schema isn't important here, it is reduced down to the parts that I am interested in.
jim
+2  A: 

Your question was about the difference between these two:

select a.* from a where a.id in (select id from b where b.id = a.id)

select a.* from a where a.id in (select id from b)

The former is a correlated subquery. It may cause MySQL to execute the subquery for each row of a.

The latter is a non-correlated subquery. MySQL should be able to execute it once and cache the results for comparison against each row of a.

I would use the latter.

Bill Karwin
Thanks. That explains some things.
jim