views:

43

answers:

4

I'm doing something different but this is an easier to understand example. Think of the votes here. I add these votes to a separate table and log information about them like by who, when and so on. Would you also add a field to the main table that simply counts the number of votes or is this bad practice.

+5  A: 

This is called "denormalization" and is considered bad practice unless you get a significant performance boost when you denormalize.

The biggest issue with this, however, is concurrency. What happens if two people vote on the poll and they both try to increment the VoteCount column?

Search denormalization on here and in Google, they're has been plenty of discussion on this topic. Find what fits your exact situation best, although, from the looks of it, denormalization would be premature optimization in your situation.

Baddie
So you mean even big websites like here have to count from their raw votes table every time a page like this is called?
samquo
Yes, some big websites do run a count, some other big sites don't. There are trade offs for doing both. Only when you notice that your site is slow specifically due to the count, then you should see if denormalizing helps.
Baddie
+1  A: 

The short answer is YES. But you should keep in mind that duplication may become a trouble or even nightmare of your system development and maintenance. If you want to store some pre-calculated cache values to improve performance, the calculation process of cache should be encapsulated and transparent to other processes.

In this case:

Solution 1: When one user votes on the poll, the detailed information will be recorded, and the vote count should be increased one automatically. (i.e. the cache calculation is encapsulated in data-writer process).

Solution 2: When the vote imformation is recoreded, nothing to do on the vote count, only a flag will be changed to mark the vote count value as dirty now. When the vote count is read, if its value is dirty, calculate it and update its value and the flag; if its value is latest (not dirty), read it directly. (i.e. the cache calculation is encapsulated in data-reader process).

Read Section 7 of the famous book The Pragmatic Programmer, you may get some ideas.

Actually, the Normal Forms used in database design is a special case of the DRY principle.

Feil
A: 

In short NO, there is no point to store data that can be fetched with a COUNT query and the second reason thet you have to manually manipulate the counter value - more work, bigger problem possibility, you have to maintain that code/algorithm. Really do NOT do it, it is a bad practice.

Yasen Zhelev
A: 

Bad.

Incorrect.

Guaranteed problems and data inconsistencies. The vote count is "derived data" and should not be stored (a duplicate). For stable data (that which does not change), summaries are fair enough.

Now if the data (no of votes) is large, and you need to count them often (in queries), then enhance that alone, the speed of the vote table from the main table, eg ensure there is an index on column being looked up for the count.

If the data is massive. Eg. a bank with millions of transactions per month, and you do not want to count them in order to produce the account balance on every query, enhance that alone. Eg. I calculate a month to date figure every night and store it at the account level; the days figure, needs to be counted, and added to the MTD figure, in order to produce the true up-to-the-minute figure. At the end of month, that month, when all the auditing processes are changing various rows across the month, the MTD figure (to yesterday) can be executed on demand.

PerformanceDBA