views:

104

answers:

1

Having this SQL tables for a tagging system:

CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);
CREATE INDEX tags_name_idx ON tags(name);

CREATE TABLE tagged_items (
    tag_id INT,
    item_id INT
);
CREATE INDEX tagged_items_tag_id_idx ON tagged_items(tag_id);
CREATE INDEX tagged_items_item_id_idx ON tagged_items(item_id);

CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    content VARCHAR(255)
);

The user's boolean expression query "tag1 AND tag2" in SQL is:

SELECT items.* FROM items
    INNER JOIN tagged_items AS i1 ON (items.id = i1.item_id) INNER JOIN tags AS t1 ON (i1.tag_id = t1.id)
    INNER JOIN tagged_items AS i2 ON (items.id = i2.item_id) INNER JOIN tags AS t2 ON (i2.tag_id = t2.id)
WHERE t1.name = 'tag1' AND t2.name = 'tag2';

How do you translate other queries with boolean expressions, such as "tag1 OR tag2 AND tag3" ...or even more complex queries such as "tag1 AND (tag2 OR tag3) AND NOT tag4 OR tag5" to SQL?

+2  A: 

Assuming that data -> items, word -> name and tagged_item -> tagged_items.

This is for "tag1 AND (tag2 OR tag3) AND NOT tag4 OR tag5". I'm sure you can figure out the rest.

SELECT items.* FROM items
    LEFT JOIN (SELECT i1.item_id FROM tagged_items AS i1 INNER JOIN tags AS t1 ON i1.tag_id = t1.id AND t1.name = 'tag1') AS ti1 ON items.id = ti1.item_id
    LEFT JOIN (SELECT i2.item_id FROM tagged_items AS i2 INNER JOIN tags AS t2 ON i2.tag_id = t2.id AND t2.name = 'tag2') AS ti2 ON items.id = ti2.item_id
    LEFT JOIN (SELECT i3.item_id FROM tagged_items AS i3 INNER JOIN tags AS t3 ON i3.tag_id = t3.id AND t3.name = 'tag3') AS ti3 ON items.id = ti3.item_id
    LEFT JOIN (SELECT i4.item_id FROM tagged_items AS i4 INNER JOIN tags AS t4 ON i4.tag_id = t4.id AND t4.name = 'tag4') AS ti4 ON items.id = ti4.item_id
    LEFT JOIN (SELECT i5.item_id FROM tagged_items AS i5 INNER JOIN tags AS t5 ON i5.tag_id = t5.id AND t5.name = 'tag5') AS ti5 ON items.id = ti5.item_id
WHERE ti1.item_id IS NOT NULL AND (ti2.item_id IS NOT NULL OR ti3.item_id IS NOT NULL) AND ti4.item_id IS NULL OR ti5.item_id IS NOT NULL;

Edit: If you want to avoid subqueries, you could do this:

SELECT items.* FROM items 
    LEFT JOIN tagged_items AS i1 ON items.id = i1.item_id LEFT JOIN tags AS t1 ON i1.tag_id = t1.id AND t1.name = 'tag1'
    ...
WHERE t1.item_id IS NOT NULL ...

I'm not sure why you'd want to do it though, as the additional left joins will likely result in a slower run.

lins314159
Is there any way to avoid using nested SELECTs when possible? as in the "tag1 AND tag2" example I gave.
Kronuz