tags:

views:

241

answers:

4

Well I have a videos website and a few of its tables are:

tags

id ~ int(11), auto-increment [PRIMARY KEY]
tag_name ~ varchar(255)

videotags

tag_id ~ int(11) [PRIMARY KEY]
video_id ~ int(11) [PRIMARY KEY]

videos

id ~ int(11), auto-increment [PRIMARY KEY]
video_name ~ varchar(255)

Now at this point the tags table has >1000 rows and the videotags table has >32000 rows. So when I run a query to display all tags from most common to least common it takes >15 seconds to execute.

I am using PHP and my code (watered down for simplicity) is as follows:

foreach ($database->query("SELECT tag_name,COUNT(tag_id) AS 'tag_count' FROM tags LEFT OUTER JOIN videotags ON tags.id=videotags.tag_id GROUP BY tags.id ORDER BY tag_count DESC") as $tags)
{
    echo $tags["tag_name"] . ', ';
}

Now keeping in mind that this being 100% accurate isn't as important to me as it being fast. So even if the query was executed once a day and its results were used for the remainder of the day, I wouldn't care.

I know absolutely nothing about MySQL/PHP caching so please help!

A: 

I think your best bet is to create some kind of summary table which you maintain when things change.

The query above needs to scan all the rows in the table in order to find the aggregates in the group by - there is NO WHERE CLAUSE. A query with no where clause has no hope of optimisation, as it necessarily has to check every row.

The fix is to create a summary table with the same data as the result of that query (or similar), which you will have to maintain from time to time when the data change or change significantly.

Only you can decide, based on the nature of your application and your data, whether it's appropriate to update the summary table on a scheduled basis, on each update, or some combination.

As you're doing a join, the right indexes are still beneficial, but you knew that, right, and had already done it?

MarkR
A: 

Are you using InnoDB or MyISAM? In MyISAM COUNT is basically free, but in InnoDB it has to physically count the rows.

Ólafur Waage
I'm using MyISAM
Andrew G. Johnson
COUNT is only free on MyISAM if you're counting all the rows, not if you're counting groups. Then it is just as expensive as you'd expect (in this case needs to scan the entire table, either way)
MarkR
Nice, thanks for the clarification MarkR
Ólafur Waage
+2  A: 

32,000 rows is still a small table - there's no way your performance should be that bad.

Can you run EXPLAIN on your query - I'd guess you're indexes are wrong somewhere.

You say in the question:

tag_id ~ int(11) [PRIMARY KEY]
video_id ~ int(11) [PRIMARY KEY]

Are they definitely in that order? If not, then it won't use the index.

Greg
Good call, it was actually in the other order. Added an index and it worked.
Andrew G. Johnson
+3  A: 

MarkR mentioned the index. Make sure you:

create index videotags_tag_id on videotags(tag_id);
Frank Krueger
Great! Down to ~0.02 seconds
Andrew G. Johnson
That's probably as fast as you'll want to tune it down to - any other speed improvements should be attained via in-memory caching.
Frank Krueger
You can probably tune it even further if you precache the tag count into an indexed column.
Eran Galperin