views:

37

answers:

1

Which table structure is considered better normalized ?

for example

Note: idType tells on which thing the comment has taken place on, and the subjectid is the id of the item the comment has taken place on.

useing idType the textually named identifier for the subjectid.

commentid ---- subjectid ----- idType
--------------------------------------
1                22            post
2                26            photo
3                84            reply
4                36            post
5                22            status

Compared to this.

commentid ---- postid ----- photoid-----replyid
-----------------------------------------------
1                22          NULL        NULL
2                NULL         56         NULL
3                23          NULL        NULL
4                NULL        NULL        55
5                26          NULL        NULL

I am looking at both of them and I dont think in the first table I would be able to relate it to a foreign key constraint =( (ie. comment gets deleted if the post or photo is deleted), where as in the second one that is possible, how would you approach a similar issue keeping in mind that the database will need to expand overtime and data integrity is also important =).

Thanks

A: 

The first is more normalized, if slightly incomplete. There are a couple of approaches you can take, the simplest (and strictly speaking, the most 'correct') will need two tables, with the obvious FK constraint.

commentid ---- subjectid ----- idType
--------------------------------------
1                22            post
2                26            photo
3                84            reply
4                36            post
5                22            status

idType
------
post
photo
reply
status

If you like, you can use a char(1) or similar to reduce the impact of the varchar on key/index length, or to facilitate use with an ORM if you plan to use one. NULL's are always a bother, and if you start to see them turn up in your design, you will be better off if you can figure out a convenient way to eliminate them.

The second approach is one I prefer when dealing with more than 100 million rows:

commentid ---- subjectid
------------------------
1                22    
2                26     
3                84     
4                36     
5                22     

postIds ---- subjectid
----------------------
1                22   
4                36   

photoIds ---- subjectid
-----------------------
2                26    

replyIds ---- subjectid
-----------------------
3                84    

statusIds ---- subjectid
------------------------
5                22     

There is of course also the (slightly denormalized) hybrid approach, which I use extensively with large datasets, as they tend to be dirty. Simply provide the specialization tables for the pre-defined idTypes, but keep an adhoc idType column on the commentId table.

Note that even the hybrid approach only requires 2x the space of the denormalized table; and provides trivial query restriction by idType. The integrity constraint however is not straight forward, being an FK constraint on a derived UNION of the type-tables. My general approach is to use a trigger on either the hybrid table, or an equivalent updatable-view to propigate updates to the correct sub-type table.

Both the simple approach and the more complex sub-type table approach work; still, for most purposes KISS applies, so just I suspect you should probably just introduce an ID_TYPES table, the relevant FK, and be done with it.

Recurse