tags:

views:

77

answers:

6

I'm making a forum.

And I'm wondering if i should store the number of replies in the topic table or count the posts of the topic?
How much slower will it be if i use sql and count them? Lets say i have a billion posts. Will it be much slower? Im not planning on being that big but what if? How much slower would i be compared to stroing the num in topics?

Thanks

+3  A: 

It will be slower as your db grows in size. If you are planning on having a large post table, store the value in the topic table

Mark
A: 

I think a lot of this will depend on how rapidly you're pushing data in. If you store the value in a topic table, then you may find that you're needing to increment (or decrement if you delete records) very frequently too.

Indexes (indices?) may be a nicer option, as you can store a tiny subset of the data, and be able to access richer information. Consider the fact that it can be quite quick to count how many Farleys there are in the phone-book, because I can go straight there and easily count them.

So, as is often the case, the answer is probably 'It depends'.

Rob Farley
+2  A: 
  1. Do not store the value in a table.
  2. Cache the value in the application for some time so the count(*) query wont be executed too often.
  3. Choose cache time depending on the server load: higher for very busy and zero for couple of users.

The count(*) in SqlServer is pretty fast (assuming you have index on the field you are counting on). So you just need to reduce number of hits under the heavy load.
If you will store the value in a table you will have a lot of hassle maintaining it.

Dmytrii Nagirniak
It's not about an index on the PK, it's about an index on whatever it is you're trying to count by. In this case, on the topic field followed by the post field. That way, it can do a stream aggregate function and quickly get what it's after.
Rob Farley
Yes, Rob. You are right, sorry for incompleteness. (will change now)
Dmytrii Nagirniak
+3  A: 

I just ran some tests on a MySQL 4.0 box we have using a table with over 1 million records.

SELECT COUNT(*) FROM MyTable; ~1 million took 22ms

SELECT COUNT(*) FROM MyTable WHERE Role=1; ~800,000 took 3.2s

SELECT COUNT(*) FROM MyTable WHERE Role=2; ~20 took 12ms

The Role column in this case was indexed and this was connecting to the MySQL remotely.

I think your posts table will have to get very large for the query times to really become an issue. I also think it is a pre-optimization to put the cache of the count in your topics table. Build it without it for now and if it becomes a problem its a pretty easy update to change it.

Peer Allan
Keep in mind though that a standard forum listing will list an average of 50 topics per page, so you have to multiply those results by 50 to get an idea of the impact. Then also consider that it's per page load. So while the individual query may not cause the individual page to hang for 30s, that number of additional queries locking a key table could lead to a performance hit. But like you said, it will take quite a bit of rows and traffic for it to be really noticable.
Steven Surowiec
A: 

I like storing counts in the table rather than counting them every time. It's such an easy operation and you never have to think about the expense of showing it when you're retrieving it. With a forum you're going to be displaying it more often than you're going to be changing it anyway so it makes sense to make that as cheap as possible. It might be a bit premature but it might save you some headaches later.

Jon
A: 

This is going to affect scaling and is an issue of normalization. Hardcore normalization nerds will tell you that you shouldn't keep the number of posts on the topic because it causes redundant data. But you need to keep in mind that if you don't store it there you need to do an extra query on every load to fetch the number. The alternative is to do an extra query on every update/insert instead, which will almost always occur much less often than select's. As you scale a site to support a lot of traffic it becomes almost inevitable that you have to eventually start to de-normalize some of your data, especially in cases like this.

Redundant data isn't inherently bad. Poorly managed redundancy is. As long as you have the proper checks in place to prevent the data from getting out of sync then the potential benefit of storing the number of posts on the thread is worth the extra bit of code IMO.

Steven Surowiec