views:

580

answers:

3

Hi everyone,

I have a MySQL query to get items that have had recent activity. Basically users can post a review or add it to their wishlist, and I want to get all items that have either had a new review in the last x days, or was placed on someone's wishlist.

The query goes a bit like this (slightly simplified):

SELECT items.*, reaction.timestamp AS date FROM items
LEFT JOIN reactions ON reactions.item_id = items.id
WHERE reactions.timestamp > 1251806994
GROUP BY items.id

UNION

SELECT items.*, wishlists.timestamp AS date FROM items
LEFT JOIN wishlist ON wishlists.item_id = items.id
WHERE wishlists.timestamp > 1251806994
GROUP BY items.id

ORDER BY date DESC LIMIT 5

This works, but when an item has been placed both on someone's wishlist and a review was posted, the item is returned twice. UNION removes duplicates normally, but because the date differs between the two rows, both rows are returned. Can I somehow tell MySQL to ignore the date when removing duplicate rows?

I also tried doing something like this:

SELECT items.*, IF(wishlists.id IS NOT NULL, wishlists.timestamp, reactions.timestamp) AS date FROM items
LEFT JOIN reactions ON reactions.item_id = items.id
LEFT JOIN wishlist ON wishlists.item_id = items.id

WHERE (wishlists.id IS NOT NULL AND wishlists.timestamp > 1251806994) OR
(reactions.id IS NOT NULL AND reactions.timestamp > 1251806994)
GROUP BY items.id

ORDER BY date DESC LIMIT 5

But that turned out to be insanely slow for some reason (took about half a minute).

+1  A: 

Not sure if this would be a huge performance hit but you could try

SELECT item_field_1, item_field_2, ..., max(date) as date
FROM
  (the query you posted) 
GROUP BY item_field_1, item_field_2, ...
Lawrence Barsanti
That gives me only 1 result (the newest).
Aistina
If I understand this correctly, you get duplicate items when they appear in both the reactions and wishlist tables. This will give you the newest date when an item appears in both tables and have no affect when it appears in only one of the tables. Make sure not to include date in the 'group by'.
Lawrence Barsanti
The MAX function causes it to only return one result... but I solved it myself, based on your suggestion. I'll post what I did in a sec.
Aistina
A: 

I solved it myself, based on larryb82's idea. I basically did the following:

SELECT * FROM (
    SELECT items.*, reaction.timestamp AS date FROM items
    LEFT JOIN reactions ON reactions.item_id = items.id
    WHERE reactions.timestamp > 1251806994
    GROUP BY items.id

    UNION

    SELECT items.*, wishlists.timestamp AS date FROM items
    LEFT JOIN wishlist ON wishlists.item_id = items.id
    WHERE wishlists.timestamp > 1251806994
    GROUP BY items.id

    ORDER BY date DESC LIMIT 5
) AS items

GROUP BY items.id
ORDER BY date DESC LIMIT 5

Though I realize this probably doesn't take into account which date is the highest for each item... Not sure yet if that matters and if so, what to do about it.

Aistina
+1  A: 

I don't think you need a UNION here at all.


SELECT item.*, GREATEST(COALESCE(wishlists.timestamp, 0), COALESCE(reaction.timestamp, 0)) as date
FROM items
LEFT JOIN reactions ON reactions.item_id = items.id AND reactions.timestamp > 1251806994
LEFT JOIN wishlists ON wishlists.item_id = items.id AND wishlists.timestamp > 1251806994
ORDER BY date DESC limit 5

Your use of LEFT JOIN above was probably very slow because of the predicate with the OR in it. You asked the database to join the three tables together then examined that result for timestamp information. My statement should form a smaller intermediate table. Items that don't have either a reaction or a wishlist will get a date of 0, which presumably will cause them not to be reported.

David M
Thanks, but your suggestion was still over 10 times as slow as my solution. +1 for teaching me about the GREATEST and COALESCE functions though!
Aistina