tags:

views:

101

answers:

3

Hello,

First of all sorry that I could not think of a more descriptive title.

What I want to do is the following using only SQL:

I have some lists of strings, list1, list2 and list3.

I have a dataset that contains two interesting columns, A and B. Column A contains a TransactionID and column B contains an ItemID.

Naturally, there can be multiple rows that share the same TransactionIDs.

I need to catch those transactions that have at least one ItemID in each and every list (list1 AND list2 AND list3).

I also need to count how many times does that happen for each transaction. [EDIT] That is, count how many full sets of ItemIDs there are for each TransactionID", "Full Set" being any element of the list1 with any element of the list2 with any element of the list3

I hope that makes enough sense, perhaps I will be able to explain it better with a clear head.

Thanks in advance

A: 

Depending on your dialect, and assuming your lists are other tables...

SELECT
    TransactionID, Count1, Count2, Count3
FROM
    MyDataSet M
    JOIN
    (SELECT COUNT(*), ItemID AS Count1 FROM List1 GROUP BY ItemID) T1 ON T1.ItemID = M.ItemID
    JOIN
    (SELECT COUNT(*), ItemID AS Count2 FROM List2 GROUP BY ItemID) T2 ON T2.ItemID = M.ItemID
    JOIN
    (SELECT COUNT(*), ItemID AS Count3 FROM List3 GROUP BY ItemID) T3 ON T3.ItemID = M.ItemID
gbn
A: 

In MySQL if you have the following lists:

list1 = ('1', '3')
list2 = ('2', '3')
list3 = ('3', '5')

then you can do this:

SELECT
    TransactionID,
    SUM(ItemID IN ('1', '3')) AS list1_count,
    SUM(ItemID IN ('2', '3')) AS list2_count,
    SUM(ItemID IN ('3', '5')) AS list3_count
FROM table1
GROUP BY TransactionID
HAVING list1_count > 0 AND list2_count > 0 AND list3_count > 0

Result:

TransactionId  list1_count  list2_count  list3_count
1              3            2            1
3              2            2            1

Test data:

CREATE TABLE table1 (ID INT NOT NULL, TransactionID INT NOT NULL, ItemID INT NOT NULL);
INSERT INTO table1 (ID, TransactionID, ItemID) VALUES
(1, 1, 1),
(2, 1, 2),
(3, 1, 3),
(4, 1, 4),
(5, 1, 1),
(6, 2, 1),
(7, 2, 2),
(8, 2, 1),
(9, 2, 4),
(10, 3, 3),
(11, 3, 2),
(12, 3, 1);
Mark Byers
A: 

If list1, list2, and list3 are actually known enumerations, you could go with:

SELECT TransactionID, COUNT(*)
FROM MyTable
WHERE ItemID IN (list1) AND ItemID IN (list2) AND ItemID IN (list3)
GROUP BY TransactionID

If you have a lot of lists, you may want to generate the SQL in a program. However, it should still perform pretty well, even for a lot of lists. Put the lists you expect to have the fewest matches in first, so that you stop evaluating the predicate as soon as possible.

If your lists are in another table, perhaps a bunch of tuples of the form (list_id, item_id), that's a trickier problem. I'd like to know more before trying to come up with a query for that.

David M