views:

255

answers:

4

I need to filter products where certain attributes are stored in a joined table that match all the required properties, i.e. users need to be able to gradually narrow down their search by adding requirements.

The problem really just concerns the properties table I think, rather than the join, given the following (simplified) table of product properties:

id  product_id  property  value
---------------------------------
1   1           color     red
2   1           size      small
3   2           color     red
4   2           size      large

how would I get all the product_ids where value is both 'red' and 'small' ?

A similar question was asked before but not answered very fully. A solution involves a COUNT and HAVING to get the rows where there are as many rows in each group as required values e.g.

SELECT product_id, count(*) AS group_count FROM properties where
value = 'red' OR value = 'small'
GROUP BY product_id
HAVING group_count = 2

This works but I'm concerned about performance, it seems like there would be a better way.

Eventually this would need to be joined with, or at least used to filter the products table:

id  name     
-------------
1   Product 1
2   Product 2

I forgot to mention that I have 2 of these properties tables joined to products that I need to filter on, one with regular attributes of a product, another with available configurable options (a bit like variants). The scenario is to allow users to filter products like: "show products where gender = 'male', brand = 'nike' and size == 'small'" where gender and brand are 'properties' and size is in options (configurable when adding to cart)

The solution of using a group with a count works with the 2 joined tables still but it gets messy, the required group count is the number of required options on the first table multiplied by the number on the second.

I could just fetch the ids from properties (and the other table) then just do a select where id IN(ids), matching a set of ids for both property tables, I don't like the idea of doing this with a really long list of ids though.

+1  A: 

Not sure this is faster, but joins from subqueries generated from your filter criteria would work:

Select p.name, p.id from product p, 
(select product_id from properties where value='red') colors,
(select product_id from properties where value='small') sizes
where p.id=colors.product_id and p.id=sizes.product_id
Steve B.
Thanks, this does work. I will compare performance. To make things more complicated I have another properties type table to join with and filter in the same way but with this approach it should be easy to involve another table.
DavidNorth
+1  A: 
SELECT DISTINCT p1.product_id, pn.name 
FROM properties p1, properties p2,
     productNames pn
WHERE p1.product_id = p2.product_id
AND p1.property = 'size' and value = 'small'
AND p2.property = 'color' and value = 'red'
AND pn.id = p1.product_id
vartec
Thanks, this works but I'm concerned how it will perform if I have to join the table to itself more than 2 times.
DavidNorth
It will perform as well as any other solution, especially the subquery approaches. Joining a table to itself is not by itself a performance killer.
JohnFx
SELECT DISTINCT p1.product_id, pn.product FROM properties p1, properties p2, products pnWHERE p1.product_id = p2.product_idAND p1.property = 'size' and p1.value = 'small'AND p2.property = 'color' and p2.value = 'red'AND pn.id = p1.product_idI tested this query. It will be fast enough.
Tom Schaefer
joining on PK is not a problem for performance.
vartec
+1  A: 

You could join the table to itself:

SELECT
prop1.product_id
FROM
properties prop1
JOIN properties prop2
 ON prop1.product_id = prop2.product_id
WHERE
prop1.property = 'color' and prop1.value = 'red'
and prop2.property = 'size' and prop2.value = 'small'
Adam
+1  A: 

Yet another encounter with one of the pitfalls of the attribute-value data model.

Assuming that you want products where the "color" matches "red" and the "size" matches "small" (you don't say in your question that the property actually matters, just the value), a big part of the question is, how are you representing the list of required matches? Will they be passed as a delimited string, stored in a temporary table, the SQL built dynamically, something else?

If you can get them into a table (temporary or otherwise) then the following queries should work. Because of the subqueries, performance is going to be very dependent on the amount of data that you are working with and how it is indexed. Also, if you end up with duplicate properties in your table for the same product then it could throw things off, so you may need to account for that.

SELECT
    P.*
FROM
    Products P
WHERE
    NOT EXISTS
    (
     SELECT
      *
     FROM
      Product_Search_Template PST
     LEFT OUTER JOIN Properties P2 ON
      P2.property = PST.property AND
      P2.value = PST.value AND
      P2.product_id = P.product_id
     WHERE
      P2.id IS NULL
    )

.

SELECT
    P.*
FROM
(
    SELECT
     PROP1.product_id,
     COUNT(*) AS match_count
    FROM
     Properties PROP1
    INNER JOIN Product_Search_Template PST ON
     PST.property = PROP1.property AND
     PST.value = PROP1.value
    GROUP BY
     PROP1.product_id
) SQ
INNER JOIN Products P ON
    P.product_id = SQ.product_id
WHERE
    SQ.match_count = (SELECT COUNT(*) FROM Product_Search_Template)
Tom H.
You're right that the property name doesn't matter, just the value. In the actual table both name and values are foreign keys.I'll be building the SQL dynamically, the required values coming from an array. I will ultimately need to filter on another properties type table in the same query.
DavidNorth
So, by the name not mattering you're saying that if they have a "size" of "small" or a "distribution range" (just making this up) of "small" that searching on "small" should treat them the same? That's not what I coded for. When I get on a real keyboard I'll clean it up.
Tom H.