tags:

views:

38

answers:

2

I have a product_table with a product_id among several other fields.

I am creating a tag_table with 2 fields, tag_id and tag_name.

The tags could be anything like "silver" or "penn state" etc.

I am also creating a product_tag_map table with product_id and tag_id in order to map a product to any number of tags.

If I wanted to create a category that contained all products tagged as "silver", "necklace", "diamond". But also exclude any product tagged as "initial", "journey"

(Obviously I would use the tag_id not tag_name, so products with tags [2,4,5] that do not have tags [3,6])

How could a create a temporary product table and populate it with the matching products?

update

here is my product_tag_map table:

CREATE TABLE `product_tag_map` (
 `product_tag_map_id` int(11) NOT NULL auto_increment,
 `tag_id` int(11) NOT NULL,
 `product_id` int(11) NOT NULL,
 PRIMARY KEY  (`product_tag_map_id`),
 UNIQUE KEY `tag_id` (`tag_id`,`product_id`),
 KEY `tag_id_2` (`tag_id`)
) ENGINE=MyISAM AUTO_INCREMENT=7897 DEFAULT CHARSET=utf8

Note: I am not using product_tag_map_id for anything. I am just in the habit of giving each table a primary key like so. So If I should remove it thats fine.

+1  A: 

The question is, what would you need a temp table for? Why not query it directly?

SELECT
  p.product_id,
  p.product_name
FROM
  product_table p
WHERE
  EXISTS (
    SELECT 1 
      FROM product_tag_map 
     WHERE product_id = p.product_id AND tag_id IN (2,4,5)
  )
  AND NOT EXISTS (
    SELECT 1 
      FROM product_tag_map 
     WHERE product_id = p.product_id AND tag_id IN (3,6)
  )

Create appropriate indexes (one multi-column index over (product_tag_map.product_id, product_tag_map.tag_id) and one separate over (product_tag_map.tag_id), in addition to the "normal" PK/FK indexes) and this should be plenty fast.


EDIT: A cacheable (as far as query plans go) and more dynamic variant of the above would be:

Create a user_searches table (search_session_id, tag_id, include) with a multi_column index over (search_session_id, include) and a separate index over tag_id. Then fill it as the user selects criteria:

search_session_id   tag_id   include
              ...
             4711        2         1
             4711        4         1
             4711        5         1
             4711        3         0
             4711        6         0
              ...

And query like this:

SELECT
  p.product_id,
  p.product_name
FROM
  product_table p
WHERE
  EXISTS (
    SELECT 1 
      FROM product_tag_map m INNER JOIN user_searches s ON s.tag_id = m.tag_id
     WHERE m.product_id = p.product_id 
           AND s.search_session_id = 4711 /* this should be a parameter */
           AND s.include = 1
  )
  AND NOT EXISTS (
    SELECT 1 
      FROM product_tag_map m INNER JOIN user_searches s ON s.tag_id = m.tag_id
     WHERE m.product_id = p.product_id 
           AND s.search_session_id = 4711 /* this should be a parameter */
           AND s.include = 0

  )
Tomalak
+1 This seems to be working great with my quick prototype. I end up doing a lot of manipulation on categories. Mainly because I give the user a lot of options to narrow the category down to filter their results. I find it easier to first put the category as a temp table, then perform the users filters on the temp table. Is that a bad idea?
John Isaacks
@John: Since creating temp tables invokes write operations (and consumes space), I'd say it's a bad idea. If all you do is manipulate the filter, the above statement is probably faster and conserves resources. Narrowing down a search could even be done in the app itself without round-trips to the DB server.
Tomalak
I tried to index like you said, I have a unique key for `(product_tag_map.product_id, product_tag_map.tag_id)` Then when I added an index to `(product_tag_map.tag_id)`. It let me, but I got this caution from phpmyadmin: `UNIQUE and INDEX keys should not both be set for column tag_id`
John Isaacks
@John Isaacks: You want a clustered index (primary key) over `(product_id, tag_id)`, not a unique key. This serves for look-ups with `product_id`-`tag_id` combinations and for `product_id`-lookups alone. For `tag_id`-lookups, you need another (standard) index on that column.
Tomalak
Thanks, I am not exactly sure how to do that, I updated my question to include my create table. Would you be so kind as to show me how to alter it to have the right indexes?
John Isaacks
@John: Try to use `PRIMARY KEY (\`tag_id\`, \`product_id\`), KEY \`tag_id\` (\`tag_id\`)` and remove the other ID column, it's superfluous.
Tomalak
That made it much faster with those indexes! Thanks for all your time.
John Isaacks
@John: You're welcome. :-) Did you try implementing the `user_searches` idea?
Tomalak
Well, the user filtering isn't currently going to be by tags, but other info on the product table like price, full-text-search, etc. However, I did take what you said about user_searches and apply it to how I tie this all into a category. I named the table category_tag_map and and instead of `search_session_id` I have `category_id`. Now I can easily create a category and set it to include/exclude products with certain tags. However the only problem is that if the category has no tags set (using other criteria like popularity) Then the `EXISTS` fails, returning nothing.
John Isaacks
+1  A: 

You can create a view on a select statement.

(A view is a basically a virtual table that you can query easily but that is popuplated with data from a complex statement. Read the manual, its not so trivial when it comes to performance and read/write behaviour.)

However your query could look like this:

SELECT 
product_id, count(*) as total 
FROM tag_map WHERE tag_id IN (2,4,5) 
AND total = 3 
AND product_id NOT IN (
SELECT product_id, count(*) as total FROM tag_map WHERE tag_id IN (3,6) 
WHERE total = 2 GROUP BY ( product_id )
)
GROUP BY ( product_id )

you can also do joins but i think it will be slower.

Joe Hopfgartner