views:

140

answers:

4

I would like to develop a Forum from scratch, with special needs and customization.

I would like to prepare my forum for intensive usage and wondering how to cache things like User posts count and User replies count.

Having only three tables, tblForum, tblForumTopics, tblForumReplies, what is the best approach of cache the User topics and replies counts ?

Think at a simple scenario: user press a link and open the Replies.aspx?id=x&page=y page, and start reading replies. On the HTTP Request, the server will run an SQL command wich will fetch all replies for that page, also "inner joining with tblForumReplies to find out the number of User replies for each user that replied."

select 
    tblForumReplies.*,
    tblFR.TotalReplies
from 
   tblForumReplies
   inner join 
     (
       select IdRepliedBy, count(*) as TotalReplies
       from tblForumReplies
       group by IdRepliedBy
     ) as tblFR 
     on tblFR.IdRepliedBy = tblForumReplies.IdRepliedBy

Unfortunately this approach is very cpu intensive, and I would like to see your ideas of how to cache things like table Counts.

If counting replies for each user on insert/delete, and store it in a separate field, how to syncronize with manual data changing. Suppose I will manually delete Replies from SQL.

+2  A: 

I would not bother with caching untill I will need this for sure. From my expirience this is no way to predict places that will require caching. Try iterative approach, try to implement witout cashe, then gether statistics and then implement right caching (there are many kinds like content, data, aggregates, distributed and so on).

BTW, I do not think that your query is CPU consuming. SQL server will optimaze that stuff and COUNT(*) will run in ticks...

Mike Chaliy
+1  A: 

tbl prefixes suck -- as much as Replies.aspx?id=x&page=y URIs do. Consider ASP.NET MVC or just routing part.

Second, do not optimize prematurely. However, if you really need so, denormalize your data: add TotalReplies column to your ForumTopics table and either rely on your DAL/BL to keep this field up to date (possibly with a scheduled task to resync those), or use triggers.

Anton Gogolev
+3  A: 

These are the three approaches I'd be thinking of:

1) Maybe SQL Server performance will be good enough that you don't need to cache. You might be underestimating how well SQL Server can do its job. If you do your joins right, it's just one query to get all the counts of all the users that are in that thread. If you are thinking of this as one query per user, that's wrong.

2) Don't cache. Redundantly store the user counts on the user table. Update the user row whenever a post is inserted or deleted.

3) If you have thousands of users, even many thousand, but not millions, you might find that it's practical to cache user and their counts in the web layer's memory - for ASP.NET, the "Application" cache.

Corey Trager
+1  A: 

For each reply you need to keep TotalReplies and TotalDirectReplies. That way, you can support tree-like structure of replies, and keep counts update throughout the entire hierarchy without a need to count each time.

Andy