views:

45

answers:

3

I'm struggling with a conceptual question. When you have a forum with thousands of posts and/or threads, how do you retrieve all those posts to be displayed on your site? Do you connect to your database every time someone visits your page then capture every post in an array and display it? Surely this seems like it would be very taxing on your server and would cause a whole bunch of unnecessary database reads. Can anyone shine some light on this topic?

Thanks.

+1  A: 

One new(ish) way of doing this is to use a Document Oriented Database like CouchDB where everything about an individual post is stored in the same document and that document gets loaded on request.

It seems in this case a Document Oriented Database would work very well for a forum or blog type site.

As far as Relational Databases go, I'm pretty sure the database gets hit every time the page loads unless there is some sort of caching implemented (then you'd have to worry about data getting stale though, which brings up a whole new mess of problems.)

Robert Greiner
Interesting, I didn't know databases like that existed. Thanks for your answer.
vince
+3  A: 

You never retrieve all those posts at once. In most case, forums show a page of X threads/posts, and you just get those X threads/posts from the database each time a page is served. RDBMS are pretty good at this. A forum is (should be) quite dynamic so indeed it generates a pretty good load on the database, but this is what database are made for, storing and retrieving data.

Serty Oan
I see, so then generally it's ok to put a fair amount of load on the db. Good to know. Thanks.
vince
If facebook can handle millions of posts, I'm sure your site can too :)
ggfan
@@ggfan, like... if Superman can stop a speeding bullet with his pajamas, I'm sure you can too.
Stephanie Page
Perfect answer.
Col. Shrapnel
A: 

Don't worry a lot about stale data. Facebook doesn't... their database is only "eventually consistent". The idea is like this: making sure that the comments are 100% always, always up-to-date is very expensive. That does put a large load on your DB. Although as Serty says, that's what the DB is made for, but whether or not your physical box is sufficient for the load is another matter.

Facebook and Digg to name a few took a different approach... Is it really all that important that every load of every page be 100% accurate? How many page loads actually result in every single comment being read by the end user anyways? It's a lot cheaper to get the comments right 'most' of the time and by 'most' I mean something you get to decide. Is a 10% chance of a page with missing comments ok? is a 1% chance? How many nodes need to have the right data NOW. When I write a new comment, how many nodes have to say they got the update for it to be successful.

I like the idea behind Cassandra which is in summary, "how much are we willing to spend to get Aunt Martha's comment about her nephew's baptism picture 100% correct?"

But that's a fine question for a free website, but this wouldn't work so good for a business application.

Stephanie Page