views:

348

answers:

5

I've inherited a database that has a structure with a table of products, a table consisting of some product attributes and another table to build the relationship between these attributes and a given product.

A user can filter the products by a combination of these attributes, meaning that if more than one attribute is selected only products with all those attributes are returned. Unfortunately, there is now an exception to this rule, whereby a user selecting one of two specific attributes needs results containing either (or both).

The query currently looks like this (not my code):

SELECT DISTINCT p.* FROM products AS p 
INNER JOIN attributes a ON p.product_id=a.property_id 
WHERE a.attribute_id IN (1,3,7) 
GROUP BY p.property_id 
HAVING COUNT(DISTINCT a.attribute_id) = 3

I doubt the above is a particularly efficient way of retrieving the required products, but I'm unsure how to proceed in light of the new requirement.

I've now created some php code to construct a special query when the two "special" attributes (3 and 7) are selected:

SELECT DISTINCT p.* FROM products AS p 
INNER JOIN attributes a ON p.product_id=a.property_id 
WHERE a.attribute_id IN (1,3) OR a.attribute_id IN (1,7) 
GROUP BY p.property_id 
HAVING COUNT(DISTINCT a.attribute_id) = 2

However, this still does not work as required - any products that share both these attributes are not returned in the result (this is obviously due to the HAVING COUNT clause, but I don't know how I go about fixing it. For clarity, the issue is if 10 products have only attribute 3 but a further five have attributes 3 and 7, the above query will only return the 10 records.

Might it be possible to use some kind of subquery or what alternatives are there?

+1  A: 

I think that looks pretty good already. Aside from the obligatory mention of "don't do select *", it looks ok to me.

My advice: if it works and its not causing a performance issue, leave it and spend your time on something else. Revisit it in future if an issue arises.

benlumley
+2  A: 

The query seems fine, except that you possibly could remove the DISTINCT modifier, since you're already grouping by id. Regarding the new requirement, can't you solve it in your code, before it reaches the SQL query?

Edit: An alternative would be to construct the query with one inner join for each required attribute, but that would probably be much slower

eliego
Thank you for your assistance. I have modified my code to alter the query when the two specific attributes (that should return results for both) appear, but I still cannot get the results I require. I've updated the question.
BrynJ
+1  A: 

This the better way to do the original query:

SELECT ... FROM products AS p 
INNER JOIN attributes a1 ON p.product_id=a1.property_id AND a1.attribute_id=1
INNER JOIN attributes a2 ON p.product_id=a2.property_id AND a2.attribute_id=3
INNER JOIN attributes a3 ON p.product_id=a3.property_id AND a3.attribute_id=7

And, given that if you need attribute_id 3, you want to OR that with attribute_id 7, and assuming you also want attribute_id 1, which is not one of those special attributes:

SELECT ... FROM products AS p 
INNER JOIN attributes a1 ON p.product_id=a1.property_id AND a1.attribute_id=1
LEFT OUTER JOIN attributes a2 ON p.product_id=a2.property_id AND a2.attribute_id=3
LEFT OUTER JOIN attributes a3 ON p.product_id=a3.property_id AND a3.attribute_id=7
WHERE a2.attribute_id IS NOT NULL OR a3.attribute_id IS NOT NULL

I suspect either of these will be much faster than the originals with the distinct/having/group by aggregate operations. The attributes table should have a multicolumn unique index on either (property_id, attribute_id) or (attribute_id, property_id), though I assume property_id is more selective and therefore should be the leftmost column in the index.

ʞɔıu
A: 

With proper (and obvious) indexes, this will be very efficient in MySQL.

SELECT ...

FROM products AS p

INNER JOIN attributes a1 ON p.product_id=a1.property_id AND a1.attribute_id=1
LEFT JOIN attributes a2 ON p.product_id=a2.property_id AND a2.attribute_id=3
LEFT JOIN attributes a3 ON p.product_id=a3.property_id AND a3.attribute_id=7

WHERE (
CASE WHEN a1.product_attribute_id IS NULL THEN 0 ELSE 1 END
+ CASE WHEN a1.product_attribute_id IS NULL THEN 0 ELSE 1 END
) > 0

le dorfier
A: 

how can you select p.* and group by only 1 column? Or does that work with a primary key?

WHERE a1.attribute_id IN (1,3) OR a1.attribute_id IN (1,7)

is the same as

WHERE a1.attribute_id IN (1,3,7)

SELECT p.* FROM products  
INNER JOIN (
    SELECT a1.property_id  
    FROM attributes a1 
    WHERE a1.attribute_id IN (1,3,7)
    GROUP BY a1.property_id 
    HAVING COUNT(DISTINCT a1.attribute_id) = 2
) as a ON p.product_id=a.property_id
dotjoe