ansaurus

Question

Optimizing a MySQL query with a large IN() clause or join on derived table

Answer 1

A:

Your data makes no sense to me, I think you are using corporationID where you mean customer ID at some point in there, as your query joins the transaction table to the transaction table for corporationID=1 based on orderID to get the corporationIDs...which would then be 1, right?

Can you please specify what the customerID, employeeID, and corporationIDs mean? How do I know employees A and B are from corporation 1 - in that case, is corporation 1 the corporationID, and corporation 2 is the customer, and so stored in the customerID?

If that is the case, you just need to do a group by:

SELECT customerID
FROM transactions
WHERE corporationID = 1
GROUP BY customerID

(Or select and group by orderID if you want one row per order instead of one row per customer.)

By using the group by, you ignore the fact that there are multiple records that are duplicate except for the employeeID.

Conversely, to returns all corporations that have sold to corporation 2.

SELECT corporationID
FROM transactions
WHERE customerID = 2
GROUP BY corporationID

Andrew Kuklewicz 2010-01-19 07:35:00

Thanks for your reply. While you are correct that the query will return corporation 1, it will also return other corporations who have been involved in the same transactions (that is, associates of corporation 1). That's the data I'm looking for.

Johannes Gorset 2010-01-19 08:25:38

Here's the field description you requested. I apologize for the wall of text - it seems there's no way to create line breaks in comments on Stack Overflow.'transactionID' is just an unique ID for a transaction. It's unimportant for this query.'orderID' is the ID of the order associated with the transaction.'customerID' is the ID of the person the order was delivered to.'employeeID' is the ID of an employee involved in the transaction.'corporationID' is the ID of the corporation the employee was working for at the time.

Johannes Gorset 2010-01-19 08:26:20

Answer 2

+1 A:

If I understand your requirement, you could try this.

select distinct t1.corporationID
from transactions t1
where exists (
    select 1
    from transactions t2
    where t2.corporationID =  1
    and t2.orderID = t1.orderID)
and t1.corporationID != 1;

or this:

select distinct t1.corporationID
from transactions t1
join transactions t2
on t2.orderID = t1.orderID
and t1.transactionID != t2.transactionID
where t2.corporationID = 1
and t1.corporationID != 1;

Phil Wallach 2010-01-19 07:56:22

Thanks for your time, Phil. The first query can't use an index for the same reason that my derived table doesn't. The second uses the right indices, but it's not returning the right data. I've adjusted it slightly so it does, and while it's using an index it's flagged as "using temporary" and "using filesort", and presumably for that reason it's taking about as long as the queries that cannot use an index. I think you're onto something, though.

Johannes Gorset 2010-01-19 08:53:14

Sorry it didn't work. That was just what I would try. I find that for some queries MySQL just can't get it done quickly, so you have to find a workaround. Posting some data would let others play with it.

Phil Wallach 2010-01-19 09:53:40

ansaurus

tags:

views:

answers:

Optimizing a MySQL query with a large IN() clause or join on derived table

related questions