tags:

views:

267

answers:

6

1I'm building a little forum for practice. I see that forums like phpBB store the thread text in a separate table. Why?

Why not store it all in the same table?

Some thing like:thread_id, thread_date, thread_text, thread_author

Why is it done this way? How would you do it?

+1  A: 

They do not store text in the same table because of the size the table can reach.

This way, even with a very large number of entries, the thread list table is small, well indexed and it's fast to scan it. The text is accessed only when necessary, using a primary key, which is fast too.

For small forums, I think this is not necessary, since there is a little coding overhead.

Julien Tartarin
TEXT columns are stored out-of-row in both engines, it's hardly an impact on table size.
Quassnoi
I agree -- I think Mario's explanation is the correct one
Jeff Atwood
+1  A: 

In addition to Julien's excellent answer, it is quite common to move posts to other threads (by say an admin or moderator). Having the text in a "post table" helps support this.

Brian Neal
Nice idea, never thought of it.
Quassnoi
+2  A: 

InnoDB doesn't support FULLTEXT indexing and MyISAM doesn't support transactions.

Don't know phpBB, but probably that's why they separate the tables.

Quassnoi
You were 1 second ahead of me ><
Mario
+3  A: 

Never looked inside the phpBB guts, but perhap it is because of full-text indexing. Inno-db engine for the main table to allow transaction and what not. MyIsam for full-text indexing.

Mario
Well...phpBB, at least in versions prior to 3.0, used MyISAM for all tables.
Brian Neal
+3  A: 

For one thing, the filesystem layout of most relational databases is such that storing large blocks of arbitrary text or data can slow down the system. Since data is usually stored by row, when doing searches the database now has to skip over variable-length text fields even when looking for unrelated fields.

Second, putting everything in one table makes it much harder to add to the data model later on, if you need more data for each thread_id, for instance.

Designing database schemas well requires some education. You should start with http://en.wikipedia.org/wiki/Database_normalization. Be sure to understand third-normal form.

drinian
+3  A: 

I don't actually know why this is done, but one reason I can imagine is optimizing search and retrieval for the post metadata (date, author, etc.).

According to Joel (and Joel is always right! ;-) databases store their data in fixed-length fields composing fixed-length records, so it's easy to jump from one row to the next just by incrementing a pointer by the byte length of a record. But large text fields used to store post text can't have a fixed size, because the length of a post varies over a wide range and creating fixed-length storage large enough to hold all posts would waste tremendous amounts of space. That means storing the post text in the same table as the other information would make it a lot slower when you want to retrieve the metadata for large numbers of posts, as is done every time somebody views the main forum page.

The way to get the best of both worlds is to put the fixed-length fields (i.e. everything except the post text) in one table and the variable-length fields (i.e. the post text) in another.

David Zaslavsky
That may be true for some (let's say 'legacy' or 'primitive') DBMS but hardly for most of the modern ones - http://www.postgresql.org/docs/current/static/storage-toast.html.
Milen A. Radev