views:

232

answers:

7

Hi,

My questions is regarding Database Modeling. I tried to look for this question amongst other Database Designing questions on SO but haven't found it and so here am asking about:

What are the general guidelines and best practices to keep in mind while designing database for an application ?

What are the best resources/books/University Lectures available on Database Design Concepts ?

Thanks.

A: 

your question is too broad. Normalization and denormalization are most used concepts.

Henry Gao
A: 

Look at wikipedia article about database normalization. There is also further reading section.

If you design a new database for brand new application you should try use ORM library (like JPA implementations in Java) that release you from database design, because these tools generate database from domain model. If you don't have any experience in this field - database generated with ORM tools will be much better of yours.

cetnar
A: 

The best thing to do is to start with a well normalized database. The wikipedia article has some good information on that along with some good references.

Typically you'll end up denormalizing parts of your database for better performance, but you almost always want to start with it in 4th normal form.

Eric Petroelje
+3  A: 

DEPENDS

this question is like saying "what is the best car to buy", it really depends on many factors including amount of data, number of concurrent users, what you are trying to do, etc. FYI, normalization is good for some database uses, but bad for others (data warehouse).

Give us a better idea of how you intend to use the data, and you'll get some better recommendations.

KM
+7  A: 

Just some things I've learned from experience (I'm sure some will disagree, but I've been querying and designing and programming databases for 30+years and have seen the effects of stupid design up close and personal):

There are three critical things to consider in all database design - data integrity (without this you essentially have no data), security and performance. All other considerations take a back seat to these three.

Never create a table without a way to uniquely identify a record.

There really are very few true natural keys that really work as a primary key, if you don't have control over whether it will change, do not use it as a primary key (you don't really want to change the company name through 27 child tables do you?). Use a surrogate key instead. Using a surrogate key does not exempt you from the need to set unique indexes if you could have used a unique composite key. Always set these indexes if you can determine a way to have a unique composite. Duplicate records are the bane of an application's existance. It seems obvious but never ever consider name to be a key field, names are not and never will be unique.

Do not use a GUID as your primary key as it can kill performance. If you need a guid for replication also consider having ana int or big int primary key.

Do not design as if you will be changing database backends unless you know up front you will be doing so. Virtually all the good techniques for performance tuning are database specific, don't harm your own ability to tune your database for a non-existant requirement.

Avoid value-entity table structures. They are miserable to query.

Add all things you need to ensure data integrity into your database design, things like defauls, constraints, triggers, etc are necessary to avoid having useless data. Do not rely on the application code to do this or you will be sorry.

Others mentioned normalization, I agree you must understand this thoroughly even if you later decide to denormalize.

Do not stack views on top of views if you want any kind of performance at all. Every database I've seen that does this is eventually a huge performance problem.

Consider what data you will need to manage the database as well as what the application needs. If you are going to be serious about databases you need to understand database auditing and your database should implement ways to find out who made what change and when and what the old data was. You'll thank me the first time someone malicious changes the data or someone deletes all the records in a table accidentally.

Really think through how the data will be queries when designing. It can make a huge difference in the design.

Do not store more than one piece of information in a field. It might look cool to put a comma delimited list into one field rather than add a related table but it is a really bad idea.

Elegance is often the enemy of performance in databases. Pick performance over elegance every time and you won't go wrong.

Avoid the use of database keywords in the naming of objects. Your programmers will thank you. Pick a naming convention and be consistent in always using it. If a field is in mulitple tables make sure it is the same name (exception if an id field has two possible foreign keys in the same table use the id field name and a prefix to identify the differnce between say Sales_person_id and Customer_person_id), same datatype and length, if applicable in all of them. Fix misspellings right away, you really don't want to spend the next ten years remebeing that in tablea it is the persnoid instead of personid.

Read about database refactoring (search on amazon for some good books) and consider how to be able to do this in your design. Few databases are designed to be refactored and being able to do so is critical towards being able to fix database problems that arise from badly thought out designs or changes to business requirements.

While you area reading, read about performance tuning, you'll learn a tremendous amount about what to avoid in designing the database.

I'm sure there's more but this is enough to start with.

HLGEM
+1, I agree with every point!
KM
You are Database GURU...Agree with all your points...Thank you being so apt with explanation...Your knowledge will surely mentor me long way...Thank you HLGEM :)
Rachel
@HLGEM: Thank you for apt explanation, also am really very eager to know your list of reading and your recommendation on Database Design and Performance Tuning, thank you for sharing worthy wisdom.
Rachel
Great list. <tablename>_id has served me well for id fields.
dverespey
Agreeing with nearly everything here. You might consider adding an understanding of how consistency and concurrency and locking work in the RDBMS you're working on.
David Aldridge
+1  A: 
marc_s
@Marc: Thank you marc for the recommended reading.
Rachel
A: 

Consider all your use cases. Think about every single possible way someone might want to get to data, and plan for those. Wear your designer, developer, tester, and user hats.

Try to think of database tables as representing physical objects.

Normalize, as others have said.

Tenner