views:

165

answers:

7

I am thinking about a DB Design Problem.

For example, I am designing this stackoverflow website where I have a list of Questions.

Each Question contains certain meta data that will probably not change. Each Question also contains certain data that will be consistently changing (Recently Viewed Date, Total Views...etc)

Would it be better to have a Main Table for reading the constant meta data and doing a join and also keeping the changing values in a different table?

OR

Would it be better to keep everything all in one table.

I am not sure if this is the case, but when updating, does the ROW lock?

+6  A: 

When designing a database structure, it's best to normalize first and change for performance after you've profiled and benchmarked your queries. Normalization aims to prevent data-duplication, increase integrity and define the correct relationships between your data.

Bear in mind that performing the join comes at a cost as well, so it's hard to say if your idea would help any. Proper indexing with a normalized structure would be much more helpful.

And regarding row-level locks, that depends on the storage engine - some use row-level locking and some use table-locks.

Eran Galperin
"Normalize until it hurts, denormalize it until it works."
Slapout
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" - Donald Knuth
WOPR
The question has absolutely nothing to do with normalization. Both schemes are perfectly normalized. It's about database partitioning.
Seun Osewa
It has everything to do with normalization. One approach is denormalized for perceived performance benefits, which should not be attempted before an actual performance issues encountered.
Eran Galperin
A: 

You should have the two table separated out as you might want to record the history of the question. The main Question table is indexed by question ID then the Status table is indexed by query ID and date/time stamp and contains a row for each time the status changes.

Don't know that the updates are really significant unless you were using pessimistic locking where the row would be locked for a period of time.

James Piggot
A: 

I would look at caching your results either locally with Asp.net caching or using MemCached.

GateKiller
This kind of caching has nothing to do with database design in its first iteration.
le dorfier
A: 

This would certainly be a bad idea if you were using Oracle. In Oracle, you can quite happily read records while other sessions are modifying them due to it's multi-version concurency control. You would incur extra performance penalty for the join for no savings.

A design patter that is useful, however, is to pre-join tables, pre-calculate aggregates or pre-apply where clauses using materialized views.

WW
+4  A: 

Your initial database design should be based on conceptual and relational considerations only, completely indepedent of physical considerations. Database software is designed and intended to support good relational design. You will hardly ever need to relax those considerations to deal with performance. Don't even think about the costs of joins, locking, and activity type at first. Then further along, put off these considerations until all other avenues have been explored.

Your rdbms is your friend, not your adversary.

le dorfier
A: 

As already said, better start with a clean normalized design. It's just easier to denormalize later, than to go the other way around. The experience teaches that you will never denormalize that one big table! You will just throw more columns in as needed. And you will need more and more indexes and updates will go slower and slower.

You should also take a look at the expected loads: Will be there more new answers or just more querying? What other operations will you have? When it comes to optimization, you can use the features of your dbms system: indexing, views, ...

MicSim
A: 

Eran Galperin already provided most of my answer. In addition, the structure you propose really wouldn't help you in terms of locking. If their are relatively static and dynamic attributes in the same row, breaking the static and dynamic into two tables isn't of much benefit. It doesn't matter if static data is being locked, since no one is trying to change it anyway.

In fact, you may actually do worse with this design. Some database engines use page locking. If a table has fewer/smaller columns, more rows will fit on a page. The more rows there are on a page, the more likely there will be a lock contention. By having the static data mixed in with the dynamic, the rows are bigger, therefore there are fewer rows in a page, and therefore fewer waits on page locks.

If you have two independent sets of dynamic attributes, and they are normally modified by different actors, then you might get some benefit by breaking them into different tables. This is a pretty unusual case, however.

I'd also point out that breaking the table into a static and dynamic portion may not be of benefit in a relatively small environment, but in a large distributed environment it may be useful to cache and replicate the dynamic data at different rates than the static data.

MikeD