views:

52

answers:

2

How is the best way to do a 'running total' system like the tags search on stackoverflow? If I click on 'php' for example, its show the total amount of items on 'each' other tags, and its very fast. How can I do this in php with mysql?

A: 

I'd imagine they are just using a simple select count(*) from questions where tag = $tagname that is cached in memcached. (<- important part)

As a commenter said, they could also just be keeping track of tag counts in a separate table. But you can't really be sure - all we can really do is guess. It would indeed work, but the key is that you should be benchmarking your application to see which approach works better for you. For all we know the tag count isn't real time, and is updated hourly by a cron in a table or something.

ryeguy
I though that too but isn't it time consuming when you have to do so many times? Is it possible that they have a separated tag table and they +1 to it everytime a question is added.
NawaMan
I think you are missing the point that the OP is wondering about the display of "related tags" counts, for given tag. This being the case, and the number of posts to date being in the order of 350,000 it seems impractical to keep pre-computed tables of counts for all possible tags with a given tag (of ordered tag pairs, may be a better description)
mjv
+3  A: 

It's a query that "looks" like that

SELECT T2.Tag, COUNT(*)
FROM SO_Posts P1
JOIN Post_Tags T1 ON P.PostId = T1.PostId
JOIN Post_Tags T2 ON P.PostId = T2.PostId
GROUP BY T2.Tag
WHERE T1.Tag = 'PHP'
ORDER BY COUNT(*) DESC

This query makes the plausible assumption that the Posts (Questions) on SO are stored in two tables;
*SO_Posts*, containing one record per Post, and holding info such as a PostId (Primary Key), the question itself, the date, the title etc.
and
*Post_Tags* which associates a given Post (by its Post_Id) with a Tag (or more likely a TagId since tags ought to be normalized, but that's a detail). For a given Post, there are as many records in *Post_Tags* as there are different tags attached to the post.
Note: in effect the structure of the SO Posts database is more complicated, with various tables for storing comments, replies etc. but with regards to the Post-to-Tag relationship, this two-table layout (or more likely a 3 tables layout allowing to have a tagId in the *Post_Tags* rather than the tag itself) captures the essence of how it is possible, easy and fast (provided the right indexes) to show these filtered agregate counts.

The idea is to find all PostIDs associated with the targeted tag (here 'PHP') (looked-up in "T1") and then to aggregate all the Posts (in "T2"), by Tag.

Note that the main table SO_Posts is not necessary here, but it would likely be part of the query, for example to allow adding extra criteria such as say the Post status (not being closed...).

mjv
+1, thanks for explaining in my original answer what the OP was asking, your response is correct, especially as you mention that joining on `Posts` is optional, but joining `Post_Tags` on itself is the main idea.
Adam Bellaire