views:

308

answers:

4

I store different items (notes, articles, pictures, files) in a single table (there are many metadata in common for all the item types - for example, categories, tags, rating, statistics etc.).

My first design was like this: table Items, plus another "detail" table for each of the item types (NoteItems, ArticleItems, PictureItems etc.). To retrieve a single item, tables must be joined one-to-one (SELECT * FROM Items INNER JOIN PictureItems ON Items.Id = PictureItems.Id WHERE Items.Id = N).

I'm pretty sure that this "by-the-book" design would work nicely (done that several times), however, I start to wonder whether the design is overkill. It would be much simpler to have a single table (Items).

Let's say that there are about 5% of items of picture or file type.

And now, the question: if I go for the (almost) single table design, would it be better to have detail tables for image fields anyway (for picture and file items, of course)?

Scenario 1: only one table: Items (for storing notes, articles, pictures, files...)

Scenario 2: two tables: Items (for storing notes, articles, picture files), ImageItems (for storing only image field of item types picture, file); one-to-one relation

(Scenario 3 would be a minor variation of Scenario 2; with 3 tables (Items, PictureItems, FileItems))

Advantages of scenario 1 are:

  • simpler select queries (no joins)
  • transaction-less updates (only one table is updated on INSERT/UPDATE)
  • performance, scalability due to transaction-less updates?

Advantages of scenario 2 are:

  • cleaner design
  • lower data consumption (in scenario 1, about 95% of items of type other than picture or file would have NULL value in the image field, that's about 16 bytes wasted for the pointer)

Which scenario would you choose: 1 (transaction-less updates) or 2 (lower data consumption)? Thanks for your opinions.

A: 

If the database doesn't need to know what's in those items (won't be indexing or searching on them) then option 1 seems the best option (assuming you just have one 'Item' column as a BLOB) - you can just read out the items as binary data and deal with them yourself as appropriate - thus avoiding that inner join.

I don't believe scenario 2 gives you lower data consumption - you can just use a BLOB field (and anyway the overhead of an extra ImageItems table is probably comparable to 16 bytes per row)

So I would personally go for option 1, but of course it depends on how you handle the Items when they come out of the database.

Mark Pim
+2  A: 

If programmers are sensible enough to query only required columns from the table instead of "SELECT *", first design approach looks okay.

Need to take care of indexing, referential constraints etc. for the second design.

Murthy
Unfortunately, "SELECT *" is exactly what's going to happen. Programmers whose task is concerned with other columns in that table may not even know it should be avoided.
egrunin
A: 

First approach is usually penalized if you are using some sort of ORM or automatically generate your DAL ( SubSonic? ).You will retreive the Image column (and it's data) everytime you pass arround a DAL object (or a collection), so normally I would use scenario 2 (or 3)

From the SQL perspective either scenarios will work about the same depending on your storage engine (ISAM, InnoDB,etc), but even there the benefits and diferences between scenarios are minor.

Radu094
A: 

If you're right about only about 5% of your row actually having additional image/binary data, then I'd definitely say use the one table approach, combined with the hint Murthy gave - make sure not to do SELECT * on this table, but only request those columns that you really need - leave out the BLOB columns as often as possible.

If your database grows, you might also want to check out a separate filegroup for the BLOB data to keep things separated and clean (but that's really only when you're dealing with hundreds of thousands of rows or more, and if you can split the filegroups onto several separate disks).

KISS - Keep It Smart and Simple - whenever possible! :-)

Marc

marc_s