ansaurus

Question

What is your opinion on using textual identifiers in table columns when approaching the database with normalization and scalability in mind?

Answer 1

A:

The first is more normalized, if slightly incomplete. There are a couple of approaches you can take, the simplest (and strictly speaking, the most 'correct') will need two tables, with the obvious FK constraint.

commentid ---- subjectid ----- idType
--------------------------------------
1                22            post
2                26            photo
3                84            reply
4                36            post
5                22            status

idType
------
post
photo
reply
status

If you like, you can use a char(1) or similar to reduce the impact of the varchar on key/index length, or to facilitate use with an ORM if you plan to use one. NULL's are always a bother, and if you start to see them turn up in your design, you will be better off if you can figure out a convenient way to eliminate them.

The second approach is one I prefer when dealing with more than 100 million rows:

commentid ---- subjectid
------------------------
1                22    
2                26     
3                84     
4                36     
5                22     

postIds ---- subjectid
----------------------
1                22   
4                36   

photoIds ---- subjectid
-----------------------
2                26    

replyIds ---- subjectid
-----------------------
3                84    

statusIds ---- subjectid
------------------------
5                22

There is of course also the (slightly denormalized) hybrid approach, which I use extensively with large datasets, as they tend to be dirty. Simply provide the specialization tables for the pre-defined idTypes, but keep an adhoc idType column on the commentId table.

Note that even the hybrid approach only requires 2x the space of the denormalized table; and provides trivial query restriction by idType. The integrity constraint however is not straight forward, being an FK constraint on a derived UNION of the type-tables. My general approach is to use a trigger on either the hybrid table, or an equivalent updatable-view to propigate updates to the correct sub-type table.

Both the simple approach and the more complex sub-type table approach work; still, for most purposes KISS applies, so just I suspect you should probably just introduce an ID_TYPES table, the relevant FK, and be done with it.

Recurse 2010-08-03 11:16:58

ansaurus

tags:

views:

answers:

What is your opinion on using textual identifiers in table columns when approaching the database with normalization and scalability in mind?

related questions