views:

184

answers:

6

Lately I've been rethinking a database design I made a couple of months ago. The main reason is that last night I read the databse schema of vBulletin and saw that they use many, MANY, tables.

The current "idea" I'm using for my schema, for instance my log table, is to keep everything in one table by differencing the type of Log with an integer:

id, type, type_id, action, message
1 ,    1,     305,      2, 'Explanation for user Ban'
2,     2,    1045,      1, 'Reason for deletion of Article'

Where type 1 = user, type 2 = article, type_id = the ID of the user, article or w/e and action 2 = ban, action 1 = deletion.

Should I change the design to two tables logBans, logSomething and so on? or is it better to keep the method I'm currently using?

+5  A: 

It depends. If you're going to have 1500000000 entries of type 1 and 1000 entries of type 2 and you'll be doing a LOT of queries on type 2, separate the tables. If not, it's more convenient to keep only one table.

Keep in mind scalability:

  • How many entries of each type will I have in 1 year?

  • How many requests on this table will I be doing ?

  • Can you, at some point, clear this log? Can you move it to another table (like archive entries older than X months) ?

marcgg
+2  A: 

The one drawback I see right now is that you cannot enforce foreign key integrity on your type_id since it points to many different tables.

Developer Art
A: 

I wouldn't do what vBulletin does. The problem with older apps like vBulletin is that while they might have started as lean-machines, over the time they collect a lot of entropy and end up being bloated. Since there are plugins, and third-party tools, and developers who've worked on the old code, breaking it is a tough choice.

That's why there is not much refactoring going on here. Don't make them your programming model. Look around, find out what works best and use that. A lot of table sounds like a bad thing to me, not good.

Cyril Gupta
A lot of tables is often a better choice than a few tables. It's called normalization.
HLGEM
Buddy, I know about normalization. Do you know about database-bloat. What part of "over the time they collect a lot of entropy and end up being bloated. Since....breaking it is a tough choice." Was difficult for you to understand
Cyril Gupta
+7  A: 

The issue here is subtyping. There are three basic approaches to dealing with subtypes.

  1. Put each record type into a completely separate table;
  2. Put a record in a parent table and then a record in a subtype table; and
  3. Put all the records in one table, having nullable columns for the "optional" data (ie things that don't apply to that type).

Each strategy has its merits.

For example, (3) is particularly applicable if there is little to no difference between different subtypes. In your case, do different log records have extra columns if they're of a particular type? If they don't or there are few cases when they do putting them all in one table makes perfect sense.

(2) is common used for a Party table. This is a common model in CRMs that involves a parent Party object which has subtypes for Person and Organization (Organization may also have subtypes like Company, Association, etc). Person and Organization have different properties (eg salutation, given names, date of birth, etc for Person) so it makes sense to split this up rather than using nullable columns.

(2) is potentially more space efficient (although the overhead of NULL columns in modern DBMSs is very low). The bigger issue is that (2) might be more confusing to developers. You will get a situation where someone needs to store an extra field somewhere and will whack it in a column that's empty for that type simply because it's easier doing that than getting approval for the DBAs to add a column (no, I'm not kidding).

(1) is probably the least frequently used scheme of the 3 in my experience.

Lastly, scalability has to be considered and is probably the best case for (1). At a certain points JOINs don't scale effectively and you'll need to use some kind of partitioning scheme to cut down your table sizes. (1) is one method of doing that (but a crude method).

I wouldn't worry too much about that though. You'll typically need to get to hundreds of millions or billions of records before that becomes an issue (unless your records are really really large, in which case it'll happen sooner).

cletus
Excellent as always Cletus. It's always a pleasure to read an answer from you, because you actually develop the answer and give good explanations about, why and how.Thank you.
kuroir
Just a note about (2), if you have a discriminator (that is a column that hints the actual "leaf" type), you may take benefit from DBMS-supported partitioning and improve performance for queries targeting a specific "leaf" type. Worth doing it only if you have a large number of objects stored, though.
Romain
+1  A: 

I want to add a small tip. A little off topic, and quite basic, but it's a lot clearer to use enum instead of tinyint for status flags, i.e.

   enum('user','type')

If there are only two statuses, tinyint is a little more memory efficient, but less clear. Another disadvantage in enum is that you put a part of the business logic in the data tier - when you need to add or remove statuses, you have to alter the DB. Otherwise it's much more clear and I prefer enum.

Raveren
A: 

I would keep things as specific as possible - in this case I would create two tables. Each table has a specific purpose so I cannot see why you would combine them.

Clive Fletcher