views:

118

answers:

4

After reading this question, I've learned that denormalization is not a solution for simplicity. What about this case?

I have news-articles which have a list of sites-article-will-be-published-to. The latter can be expressed in normalized fashion either by table and a many-to-many relationship (via a cross-table, I think). But the simple solution is to just throw in a bunch of booleans for the sites-article-will-be-published-to (publish_to_site_1, publish_to_site_2 etc.). Assuming the sites are:

  1. small in number
  2. will not change over time
  3. have no fields themselves, except a name

Is this still a terrible idea? The many-to-many relationship seems somewhat cumbersome, but I've done it before in cases like this (and it seemed cumbersome).

Note: I'm doing this in Rails, where it's not that painful. On the other hand, the metaprogramming makes things like this trivial

(1..5).each { |site| do_something(article["publish_to_site_#{site}".to_symbol]) }
+3  A: 

If these conditions are actually satisfied, then no, it's not a terrible idea.

In fact, this is not even denormalization: Denormalization usually means that you are storing some information redundantly, for sake of performance. In your example, since the sites do not have fields themselves, you are not storing stuff redundantly. You are just depriving yourself from the opportunity to store additional fields for the sites in the future (without violating normalization or redesigning your database).

So, this is OK (normalized):

article                        show_on_stackoverflow    show_on_my_blog
-----------------------------------------------------------------------
Denormalize for Simplicity             YES                     NO
More simplicity                        YES                     YES
...

But this is not OK (redundancy):

article                        show_on_stackoverflow    stackoverflow_mainpage_url   show_on_my_blog    my_blog_mainpage_url
------------------------------------------------------------------------------------------------------------------------------
Denormalize for Simplicity             YES              http://stackoverflow.com            NO          http://my.blog.url       
More simplicity                        YES              http://stackoverflow.com            YES         http://my.blog.url
...
Heinzi
That's true, but makes me think that for almost no gain I'm locking myself into a painful future if there is a change. That's bad.
Yar
Exactly. If there is the possibility that you will need additional fields in the future, better invest a few more joins now than having to redesign everything afterwards.
Heinzi
I like that you took the question at face value and didn't lecture me on the obvious: that you probably never meet those conditions for real. Data tends to attract data, or something, so the "show_on_so" inevitably gets another field... so you end up with redundancy (as in your second example), and then it is denormalized-for-simplicity which is bad.
Yar
You can implement new fields as new tables and just join them.
Seun Osewa
A: 

If you think of the booleans "sites-article-will-be-published-to" as merely attributes of the primary datum such as "isGreen", "hasHair", "isBipedal", then a single table is normalized in the sense that it would be perverse to have a foreign key to the table Green{<true>, <false>}.

Obviously, if your 3 conditions do not continue to hold true, the next guy will have non-trivial work to do, but "as simple as possible, but no simpler" has its utility.

msw
Sorry, I wasn't clear. The other table would be Characteristics (in your example) and have chars like "green" and "bipedal."
Yar
A: 

Personally, I think I'd not denormalize. I my opinion, a single n:n-relation is not that much cumbersome to join if you're familiar with SQL. What might be cumbersome is using the denormalized structure for different queries. For example, are you sure you're never going to need a list of all sites an article is published to...?

Not that I'd ever call your approach terrible, but I usually prefer normalized data, happily doing another join :)

Cheers Matthias

Mudu
@Mudu, changes the question to include some meta-Ruby stuff.
Yar
+1  A: 

Assumption two is unrealistic.

Therefore, in complete accordance with "If these conditions are actually satisfied, then no, it's not a terrible idea." : yes, it is a terrible idea.

Erwin Smout
I'll vote this up because I came to the same conclusion in my case (actually I think that #3 is also not realistic most of the time), but I still think that we have to be careful: it's not unrealistic if it's real: It's just very unlikely, but you could have a case (based on some legal criteria, perhaps) in which it is satisfied.
Yar