views:

379

answers:

2

Let's say I need to query the associates of a corporation. I have a table, "transactions", which contains data on every transaction made.

CREATE TABLE `transactions` (
  `transactionID` int(11) unsigned NOT NULL,
  `orderID` int(11) unsigned NOT NULL,
  `customerID` int(11) unsigned NOT NULL,
  `employeeID` int(11) unsigned NOT NULL, 
  `corporationID` int(11) unsigned NOT NULL,
  PRIMARY KEY (`transactionID`),
  KEY `orderID` (`orderID`),
  KEY `customerID` (`customerID`),
  KEY `employeeID` (`employeeID`),
  KEY `corporationID` (`corporationID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

It's fairly straightforward to query this table for associates, but there's a twist: A transaction record is registered once per employee, and so there may be multiple records for one corporation per order.

For example, if employees A and B from corporation 1 were both involved in selling a vacuum cleaner to corporation 2, there would be two records in the "transactions" table; one for each employee, and both for corporation 1. This must not affect the results, though. A trade from corporation 1, regardless of how many of its employees were involved, must be treated as one.

Easy, I thought. I'll just make a join on a derived table, like so:

SELECT corporationID FROM transactions JOIN (SELECT DISTINCT orderID FROM transactions WHERE corporationID = 1) AS foo USING (orderID)

The query returns a list of corporations who have been involved in trades with corporation 1. That's exactly what I need, but it's very slow because MySQL can't use the corporationID index to determine the derived table. I understand that this is the case for all subqueries/derived tables in MySQL.

I've also tried to query a collection of orderIDs separately and use a ridiculously large IN() clause (typhically 100 000+ IDs), but as it turns out MySQL has issues using indices on ridiculously large IN() clauses as well and as a result the query time does not improve.

Are there any other options available, or have I exhausted them both?

A: 

Your data makes no sense to me, I think you are using corporationID where you mean customer ID at some point in there, as your query joins the transaction table to the transaction table for corporationID=1 based on orderID to get the corporationIDs...which would then be 1, right?

Can you please specify what the customerID, employeeID, and corporationIDs mean? How do I know employees A and B are from corporation 1 - in that case, is corporation 1 the corporationID, and corporation 2 is the customer, and so stored in the customerID?

If that is the case, you just need to do a group by:

SELECT customerID
FROM transactions
WHERE corporationID = 1
GROUP BY customerID

(Or select and group by orderID if you want one row per order instead of one row per customer.)

By using the group by, you ignore the fact that there are multiple records that are duplicate except for the employeeID.

Conversely, to returns all corporations that have sold to corporation 2.

SELECT corporationID
FROM transactions
WHERE customerID = 2
GROUP BY corporationID
Andrew Kuklewicz
Thanks for your reply. While you are correct that the query will return corporation 1, it will also return other corporations who have been involved in the same transactions (that is, associates of corporation 1). That's the data I'm looking for.
Johannes Gorset
Here's the field description you requested. I apologize for the wall of text - it seems there's no way to create line breaks in comments on Stack Overflow.'transactionID' is just an unique ID for a transaction. It's unimportant for this query.'orderID' is the ID of the order associated with the transaction.'customerID' is the ID of the person the order was delivered to.'employeeID' is the ID of an employee involved in the transaction.'corporationID' is the ID of the corporation the employee was working for at the time.
Johannes Gorset
+1  A: 

If I understand your requirement, you could try this.

select distinct t1.corporationID
from transactions t1
where exists (
    select 1
    from transactions t2
    where t2.corporationID =  1
    and t2.orderID = t1.orderID)
and t1.corporationID != 1;

or this:

select distinct t1.corporationID
from transactions t1
join transactions t2
on t2.orderID = t1.orderID
and t1.transactionID != t2.transactionID
where t2.corporationID = 1
and t1.corporationID != 1;
Phil Wallach
Thanks for your time, Phil. The first query can't use an index for the same reason that my derived table doesn't. The second uses the right indices, but it's not returning the right data. I've adjusted it slightly so it does, and while it's using an index it's flagged as "using temporary" and "using filesort", and presumably for that reason it's taking about as long as the queries that cannot use an index. I think you're onto something, though.
Johannes Gorset
Sorry it didn't work. That was just what I would try. I find that for some queries MySQL just can't get it done quickly, so you have to find a workaround. Posting some data would let others play with it.
Phil Wallach