views:

67

answers:

3

Hi, i have a question about basic mysql database optimisation. I have 3 tables, Articles, Tags and Taggings (which is a join table).

Articles         Taggings             Tags
id               id                   id
name             article_id           name
                 tag_id

I am retrieving the articles that exactly match the tags specified, with the following query

SELECT *, COUNT(*) AS c
FROM articles AS a
JOIN taggings AS tng ON a.id = tng.article_id
JOIN tags AS t ON t.id = tng.tag_id
WHERE t.name IN ("Red","Green")
GROUP BY a.id
HAVING c = 2

This query is slow, so I did an EXPLAIN, and got the following results:

alt text

Now, I don't really understand what I am doing here, but i believe that "type: ALL" is not good, so thought i would add indexes(BTREE) to both article_id and tag_id in the taggings table, and run the query again. alt text Well that didn't look any better to my uneducated eye, same number of rows as previous one, and the type is still ALL in two of the cases.

So could someone tell me where I am going wrong please? will indexes not help me with this problem?

My Tag table will remain relatively small, so I thought the query should scan the Tag table for the tags I have specified, and then (through the indexes) be able to instantly retrieve the associated properties, and it should all be very quick, obviously something wrong in my thinking.

Thanks

[EDIT] - for Jay's comments

I added 10k articles, 30k taggings, and 6 tags, also added 2 indexs on tag.name and taggings.tag_id, the query still took a long time to run, 0.5-1 second, the EXPLAIN is below. alt text

+1  A: 

You could also try using joining to the tables twice instead of a GROUP BY. This sometimes produces a faster query:

SELECT a.*
FROM articles AS a
JOIN taggings AS tng1 ON a.id = tng1.article_id
JOIN tags AS t1 ON t1.id = tng1.tag_id AND t1.name = "Red"
JOIN taggings AS tng2 ON a.id = tng2.article_id
JOIN tags AS t2 ON t2.id = tng2.tag_id AND t2.name = "Green"
Mark Byers
Don't need the WHERE - move those up to the JOINs... But really, this isn't that friendly if you need to support different numbers of tags that all match.
OMG Ponies
Thanks for your reply Mark, but unfortunately i will need to support different numbers of tags.
Jon
I just discovered this artilce http://joinfu.com/presentations/tagging.pdf which suggests the same as you, problem i would face is how to dynamically create such a query for differnet numbers of tags.
Jon
@Jon: You can use string concatenation and add a new join for each tag you want to include in your query. Obviously if there are many tags the query will get long so there may be a limit on the number of tags that it is sensible to search for using this method, but for a relatively small number of tags in the query it should be fine. Usually the optimizer finds an efficient way to execute the joins.
Mark Byers
@Jon: You might also want to look at this similar question where I gave a similar answer but with more details: http://stackoverflow.com/questions/3260543/mysql-tagging-question-how-to-select-an-item-that-has-been-tagged-as-x-y-and
Mark Byers
+2  A: 

Because tags.name is the only column that really reduces a number of rows in result set, it must be indexed to make any tag-based search query faster.

Update: try to run this query

SELECT a.*
FROM articles AS a
JOIN taggings AS tng ON a.id = tng.article_id
JOIN tags AS t ON t.id = tng.tag_id
WHERE t.name IN ("Red","Green")
GROUP BY a.id
HAVING COUNT(DISTINCT t.id) = 2
Naktibalda
The tags table is relatively small, i added the index as you suggested, and its not made any difference to the query times.
Jon
Does it affect output of EXPLAIN?
Naktibalda
The effect of adding a index to tags.name, plus some other changes, can be seen using EXPLAIN on my edit to my original post.
Jon
Do you have an index covering both columns in taggings table? It should be UNIQUE to prevent duplicated relation. UNIQUE KEY(tng.tag_id, tng.article_id)
Naktibalda
I didn't, but I do now, and that seems to have improved performance. Also the DISTINCT has helped speed things up. I also changed to InnoDB, which resulted in 4x speed improvement. With all these little things, the time it takes to run query now looks more reasonable. Goodness knows how delicious/flickr manage with the amount of data they need to handle.
Jon
+1  A: 

There are a couple of things going on here.

First, your tables presently are apparently very small. When the table is small, the DBMS often finds it faster to read the whole thing rather than to use any index. To get meaningful EXPLAIN results, you need to get realistic numbers of records in the tables.

It also looks like you have the "id" fields declared as primary keys. Primary keys are a subclass of indexes, so those should be available. Note the explain plan indicates it used the primary key to find the Tag record.

The obvious starting point of this query is Tags. So if this is an important query, I'd create an indexon Tags(name). Then it wouldn't need to sequentially search the Tags table.

From there it should look up Taggings by tag_id. So you should have an index on that.

Then it could look up Article by article_id. That's the primary key so it should already be there.

So I think you'd get the most effective plan with two indexes: Tags(name) and Taggings(tag_id).

Jay
Thanks for your reply Jay, i did as you suggested and added 10k articles, 30k taggings, and 6 tags. I added the indexes you suggested, but the query still takes 0.5-1 second to run. I have edited my original post to show the new explain.
Jon
Hmm. It's doing eq_ref on two of the tables, which is as good as those are going to get. So the remaining issue is the order in which it decided to read the tables. Despite the fact that you have an index on Tag.name, it decided to do a full file search on Taggings. This is a little curious. If you have 6 tags and you're searching on 2 of them, well, maybe the DBMS doesn't think that narrows it down enough to be useful. Oh, after adding all the records you should run Analyze to get the statistics updated. That may help. If not, this may be as good as it's going to get.
Jay