ansaurus

Question

Optimizing join query to fetch data from A with condition on B sorted by B

Answer 1

+1 A:

I didn't actually try this but maybe it points you in the right direction. The idea is to make a temp result of the UNION of (unique) id, similarity pairs from similarities, then join items with that.

SELECT Item.*, s.other_item_id, s.similarity
FROM items AS Item
JOIN
    (
    SELECT item1_id AS id, item2_id AS other_item_id, similarity FROM similarities
    UNION
    SELECT item2_id AS id, item1_id AS other_item_id, similarity FROM similarities
    ) AS s ON s.id = items.id
WHERE items.id IN (1, 2, 3, ...)
ORDER BY s.similarity DESC;

In your original query you don't need to restrict the ids from similarities in both the JOIN condition and the WHERE clause.

gregjor 2010-10-22 02:37:18

So basically my two-seperate-queries strategy rolled into one query? :) I'll give this a try. I have to see if how to push this through my DAL though.

deceze 2010-10-22 02:41:25

It would help if you can give an example of the desired result. Do you want to see the other item ID a given item is similar to?

gregjor 2010-10-22 02:43:26

I only want the records of the items, like a normal `SELECT * FROM items`. The similarity is only the condition, I don't need to fetch it. In fact, I'll probably be using a `LIMIT` to only fetch a number of "top similar" items. You could describe the query as "Find 10 items that are most similar to these items (1, 2, 3, ...)."

deceze 2010-10-22 02:47:07

OK... so you will have a WHERE condition that limits the result to similarity in a specific range or something?

gregjor 2010-10-22 02:48:17

Comment timing overlap... :) It'll be a `LIMIT`ed query, hence the sorting. Also, similarities of 0 are omitted in the matrix, so as long as there *is* a similarity and it's sorted in descending order, that's fine.

deceze 2010-10-22 02:51:12

I see. I'm not sure my solution gets exactly what you want but maybe it points you to another way to do the join without so many conditions. You could make a VIEW from the UNION'd SELECTs, or a temp table if similarities doesn't change often and you need to run a lot of queries against it.

gregjor 2010-10-22 02:53:04

Hmm, overall the query is about as slow as my original one, since it still requires a ginormous JOIN...

deceze 2010-10-22 03:31:14

This brought me on the right track though. Thanks!

deceze 2010-10-22 07:46:33

Answer 2

+1 A:

I am wondering whether joining to the items table twice will perform better than the two queries. Pardon the psuedo-code-ish SELECT portion of this statement - I think you'll actually need a CASE for every field value...

SELECT    
CASE WHEN `Item2`.`id` IS NULL THEN 
  `Item1`.`id`
ELSE `Item2`.`id`
END,

SELECT    
CASE WHEN `Item2`.`id` IS NULL THEN 
  `Item1`.`name`
ELSE `Item2`.`name`
END,

SELECT    
CASE WHEN `Item2`.`id` IS NULL THEN 
  `Item1`.`description`
ELSE `Item2`.`description`
END,

[and so on]

FROM      `items` AS `Item1`
LEFT OUTER JOIN `similarities` AS `Similarity`
       ON (`Item1`.`id` = `Similarity`.`item1_id`
RIGHT OUTER JOIN `items` AS `Item2`
       ON (`Item2`.`id` = `Similarity`.`item2_id`       
WHERE     `Similarity`.`item1_id` IN (1, 2, 3, ...)
          OR `Similarity`.`item2_id` IN (1, 2, 3, ...)
ORDER BY  `Similarity`.`similarity` desc

LesterDove 2010-10-22 02:49:11

Uuh, funky, thanks! :) I'm getting a syntax error on the `CASE` statement within the `SELECT` statement though. Just testing with `SELECT Item1.*, Item2.*` gives blazingly fast results though. I'll have to play around with this more. Would be happy if you could improve this as well.

deceze 2010-10-22 03:03:34

Testing with `CASE WHEN Item2.id IS NULL THEN Item1.id ELSE Item2.id END CASE` gives me a syntax error near `CASE FROM ...` (the `END CASE` statement) BTW.

deceze 2010-10-22 03:06:39

Thanks, I just edited it a bit. Separate CASE for each field name (this is only so that your new query contains the same # of fields as your old one, since you mentioned an existing DAL already.)

LesterDove 2010-10-22 03:11:12

I ended up with a different solution, but your answer gave me some inspiration. Thanks!

deceze 2010-10-22 07:47:04

Answer 3

A:

Thanks to the inspirations, I ended up with this query:

SELECT `Item`.*
FROM `items` AS `Item`
JOIN (
    SELECT `item1_id` AS `id`, `similarity`
    FROM   `similarities`
    WHERE  `similarities`.`item2_id` IN (1, 2, 3, ...)
    UNION
    SELECT `item2_id` AS `id`, `similarity`
    FROM   `similarities`
    WHERE  `similarities`.`item1_id` IN (1, 2, 3, ...)
) AS `SimilarityUnion` ON `SimilarityUnion`.`id` = `Item`.`id`
GROUP BY `SimilarityUnion`.`id`
ORDER BY `SimilarityUnion`.`similarity` DESC

deceze 2010-10-22 07:45:47

ansaurus

tags:

views:

answers:

Optimizing join query to fetch data from A with condition on B sorted by B

related questions