views:

86

answers:

5

I have a table called transactions with a many-to-many relationship to items through the items_transactions table.

I want to do something like this:

SELECT "transactions".* 
  FROM "transactions" 
INNER JOIN "items_transactions" 
        ON "items_transactions".transaction_id = "transactions".id 
INNER JOIN "items" 
        ON "items".id = "items_transactions".item_id 
WHERE (items.id IN (<list of items>))

But this gives me all transactions that have one or more of the items in the list associated with it and I only want it to give me the transactions that are associated with all of those items.

Any help would be appreciated.

A: 

The final bit of the query looks wrong:

WHERE (items.id IN (<list of items>))

the 'in' statement is like a big OR statement rather than an AND statement, so it is expanded by the optimizer as:

WHERE (items.id = 123 OR items.id = 456 OR items.id = 789)

EDIT

I reckon you need to perform a correlated subquery on the items table.

amelvin
(items.id = 123 and items.id = 456 and items.id = 789) will never be true.
dcp
@dcp now you mention it, it does look a bit unlikely.
amelvin
A: 

I haven't executed this, but that should get you the result you want:

SELECT t.* FROM items i
    INNER JOIN items_transactions it ON i.id = it.item_id
        INNER JOIN transactions t ON it.transaction_id = t.id
WHERE i.id IN (1,2,3)
SchlaWiener
He wants transactions that are associated with *all* of those items, but your solution will give him transactions that are associated with any of them.
dcp
+5  A: 

You have to expand out your query for all of the items in the list:

SELECT "transactions".* 
FROM "transactions" 
WHERE EXISTS (SELECT 1 FROM "items_transactions"
              INNER JOIN "items" ON "items".id = "items_transactions".item_id 
              WHERE "items_transactions".transaction_id = "transactions".id
              AND "items".id = <first item in list>)
AND   EXISTS (SELECT 1 FROM "items_transactions"
              INNER JOIN "items" ON "items".id = "items_transactions".item_id 
              WHERE "items_transactions".transaction_id = "transactions".id
              AND "items".id = <second item in list>)
...

You might also be able to massage it out using IN and COUNT DISTINCT, I'm not sure which would be faster. Something like (completely untested):

SELECT "transactions".* 
FROM "transactions" 
INNER JOIN (SELECT "items_transactions".transaction_id 
            FROM "items_transactions"
            INNER JOIN "items" ON "items".id = "items_transactions".item_id 
            WHERE "items".id IN (<list of items>)
            GROUP BY "items_transactions".transaction_id
            HAVING COUNT(DISTINCT "items".id) = <count of items in list>) matches ON transactions.transaction_id = matches.transaction_id
lc
Is there really no nicer way to do this? There are like 100 items in the list, I don't think postgres can handle queries that long.
Maarten
I like the use of 'count=number of items in list'
eugeneK
The count idea works perfectly, I only had to change transactions.transactions_id to "transactions".id for my case.Thanks a lot!
Maarten
A: 

I think this does what you want.

I would put the list of items you need in to a table (temp one will be fine) and join on to that. Then count the number of distinct items and match the count to the item transactions count.

I've provided the sample DDL & Data that I used.

Create table #trans
(
transId int identity(1,1),
trans varchar(10)
)

Create Table #itemTrans
(
transId int,
itemId int
)

Create table #items
(
itemId int identity(1,1),
item varchar(10)
)

Create table #itemsToSelect
(
itemId int
)


Insert Into #trans
Values ('Trans 1')

Insert Into #trans
Values ('Trans 2')

Insert Into #trans
Values ('Trans 3')


Insert Into #Items
Values ('Item 1')

Insert Into #Items
Values ('Item 2')

Insert Into #Items
Values ('Item 3')

Insert Into #Items
Values ('Item 4')

Insert Into #itemTrans
Values (1, 1)

Insert Into #itemTrans
Values (1, 2)

Insert Into #itemTrans
Values (1, 3)

Insert Into #itemTrans
Values (2, 1)

Insert Into #itemTrans
Values (2, 3)

Insert Into #itemTrans
Values (3, 4)



Insert Into #itemsToSelect
Values (1)
Insert Into #itemsToSelect
Values (2)
Insert Into #itemsToSelect
Values (3)


Select t.transId

From #items i 
Join #itemTrans it on i.itemId = it.itemId
Join #trans t on it.transId = t.transId

Join #itemsToSelect its on it.ItemId = its.ItemId

Where it.TransId is not null
Group by t.transId
Having count(distinct(it.itemId)) = (Select count(distinct(itemId)) from #itemsToSelect)
Barry
A: 
SELECT transactions.*
WHERE (SELECT count(*)
       FROM items_transactions
       WHERE items_transactions.transaction_id = transactions.transaction_id
             AND items_transactions.item_id IN (<list of items>)
      ) = <number of items>

Although this will probably do a scan of transactions, nesting the correlated subquery for each one... not particularly efficient, so maybe:

SELECT transactions.*
WHERE EXISTS (SELECT 1 FROM items_transactions
              WHERE items_transactions.transaction_id = transactions.transaction_id
              AND items_transactions.item_id IN (<list of items>)
      )
      AND
      (SELECT count(*)
       FROM items_transactions
       WHERE items_transactions.transaction_id = transactions.transaction_id
             AND items_transactions.item_id IN (<list of items>)
      ) = <number of items>

or something similar to persuade the DB to find transactions related to at least one of the items first, and then check each transaction is linked against all the items later.

As someone's noted, you can also simply generate join clauses for each item instead, which might well be better if the number of items isn't large.

araqnid