views:

5383

answers:

14

I am pretty well versed with SQL Server, MySQL, Oracle etc but putting these Database products aside, is there a resource that will help me design relational databases well? Is there something like patterns or best practices for database design?

I have seen a few times that database is often not scalable; people have personal preferences with keeping columns like isChecked column which is boolean in nature but stored as Char(1) with values like 'Y' and 'N' instead of 0 and 1 which to me sounds better. Ways not to commit common mistakes while doing database design?

Links to books or articles will be highly appreciated.

Thanks in advance.

+2  A: 

Possibly the most famous best practice is database normalisation. This set of techniques allows you to design your database so that redundant items are removed, and fields are grouped logically.

Pete OHanlon
They preach this in school... Enter the real-world and I come to find this isnt always a good thing!!
+3  A: 

As with anything, the answer here is, "It Depends."

Databases can be used to do different things, and some of those things will require opposing directions in design and development.

An OLTP database system will be designed entirely differently from one used as a reporting or warehousing solution. The first is often normalized, and a warehouse is often de-normalized. This helps the system get the desired performance for its intended behavior.

Even within a segment of this, depending on if the usage will be read-heavy or write-heavy, different design decisions might be appropriate.

The best bet is to look into best practices for a much smaller segment of database development that corresponds to the type of application you're trying to build.

BQ
+1  A: 

One concept we use around here that has proved pretty nice is the "Lookup Code" table. If you have a database that has a lot of references to items that effectively codes, or types, or the like, keep all of them in a single LookupCode table that bases things off of a CodeGroup and the Code itself.

We keep an additional flag for the active status of the code, as well as a few optional numeric columns that can be used if given lookup code needs to be sorted or calculated in any kind of fashion.

By doing this, you eliminate having tons of tiny little tables scattered around your schema. Now one of the downsides to this is that the primary key for the table is the code group and code itself, so there isn't a foreign key attached to the "master" table that is referencing a given code, but a little bit of enforcement in the application is easily to accommodate for this.

Dillie-O
-1: This also obscures the cardinality of joins to the table, makes data integrity more difficult to enforce, and is pretty widely regarded as a bad practice.
David Aldridge
A: 

i would say that as long as the database is normalized and if you are making a VLDB then partition it correctly then you should be fine. other best practices include using CRUD for the stored procedures and ensuring that all tables cascade properly. most everything else is subjective. Using "Y/N" is old school database programming from when bit hasn't been introduced yet. It can also be used for scalability purposes like "Y/N/Maybe" but if that were the case bast practices would say to normalize that and make a lookup table.

Russ Bradberry
+1  A: 

The best book I've ever read in regards to database design is "Database Design for Mere Mortals" by Michael J Hernandez. The name sounds like a beginners book, but people at any level could gain knowledge from it. It's also platform independent as it deals with looking at the data itself and how to properly organize it - not the technology being used.

He also wrote a book on writing queries named "SQL Queries for Mere Mortals" that I've heard (haven't read this one myself yet) is quite good.

Database Design for Mere Mortals

invenetix
+9  A: 

There are numerous database design patterns. They aren't often nicely formalized, so you may have to simply look at lots of database design.

See, for example, Fowler's books on design patterns. Also Nock's Book.

There are blogs, like database programmer.

There's an IEEE book, On Pattern-Based Database Design and Implementation.

The Google Search (link) turned up 24M hits.

S.Lott
+1 for Marting Fowler
abatishchev
+2  A: 

Dont store calculated values

Example, You have table "Squares" with column "width". No need to make a column "area", because that can be calculated via width ^ 2

Generally yes, but there are plenty of exceptions.
Chris Pebble
Just curious, can you name one ? Maybe if the calculated field at hand is very resource intensive to calculate?
If your system does not support function-based indexes, and you want to be able to index on the calculated value.
David Aldridge
good example, i never thought of that! (though, my advice still holds water if you were not trying to index "area")
+3  A: 

My take on this is somewhat contrarian. I would advise, don't stress the design of the database too much.

Sometimes this may be hard. With internal LOB applications, the prevailing view of the business is often times that the DATA is the primary asset, where as the software is somewhat expendable.

My advice would be: don't buy it.

In reality the asset is the company's ability to INTERACT with the data. To view it, to manipulate it, and to make decisions based on it.

This means that even though they may place a high value on the data, what they are actually valuing is the software that you are writing.

This means I would focus most of your effort on building an effective user experience, rather than on "designing the perfect database". The database is really just a tool that enables you to deliver on a user experience.

The key feature of relational data models is data and access path independance. You can add columns, change keys, introduce or remove indexes, etc, while having zero impact (or close to zero) on the applications that use it.

This makes the database structure extremely pliable.

Trying to design the database to "be flexible for the future", or to "optimize performance" is mostly a wasted effort.

Changing the structure of the database will have a relatively small impact on your system.

Also, you really can't predict how the database will scale until you run into the scenarios where you need it to scale. Your best bet is to wait until you hit performance issues. and then address them specifically.

Making changes to the user experience of your app, however, is usually more expensive. UI work is time consuming, and usually takes a while to get right.

So, I would recommend that you:

  1. Just produce a crappy DB design
  2. React to the actual performance scenarios you encounter
  3. Focus your efforts on user experience, not on the database
Scott Wisniewski
great advice actually. Usually students of this discipline are shown a good design and it looks like magic to them, but it wouldn't if they began w/ step 1. +1
doug
+10  A: 

A few points:

  • Learn as much as you can about problem domain. You can't create good data model without knowing what you're designing for
  • Have good knowledge about data types provided by your database provider
  • How to properly use normalisation and design tables
  • Performance: when and how to apply indexes, how to write efficient queries etc.
  • When and how to use different DB objects like views, procedures, functions, triggers
Mr. Brownstone
+2  A: 

if you don't document enums in the description column of the schema so that i can figure out what the '5' is in this:

Select name from peeps where accountStatusId = 5

then do this

Use a table to enumerate a field. eg:

Select name 
from peeps p 
join accountStatus s 
on p.accountStatusID = s.asid 
where s.accountStatus = 'ActiveDude'
rizzle
+5  A: 

To counter Dillie-O's advice. I'd suggest that you don't put all your lookups into one table. In general, this is an attempt to force OO design into a Relational Database. It can be done and it fits the world view of an OO developer, but it leads to crippling database designs.

Bounce over to Google and search for "MUCK Tables" which lead you to discussions of Massively Unified Code-Key Tables. Alternatively, you can look for "one true lookup table" for discussions. Or even read Joe Celko's article One True Lookup Table .

Nik Hughes
+2  A: 

The relational database is an extremely powerful abstraction; it's a collection of facts and a predicate calculus. Not only that, SQL enforces command query separation by having one clause for examining rows and another for changing rows.

When you think of a database as a truth reasoning engine, it makes sense to have a set-up that does not allow contradictions to flow from the data you're modelling. Therefore, to use a relational database effectively you need to get your database design right. Unlike the design of object orientated programs, there is a consensus view on how a relational database should be designed. The proper approach to database design is normalise as far as it is sensible. Most people normalise up to third normal form but you can in fact go up to fifth normal form.

If possible, you want to expunge null column values from your database. If you agree with my view of the database as a truth reasoning engine, then nulls are a real problem. When you have nulls in a database the law of excluded middle does not hold. This makes "proof by contradiction" of any given property of the database more difficult that it would be without the nulls. Nulls unnecessarily complicate the semantics of the database.

Sometimes it will be necessary to break the rules of normalisation for performance reasons. However, don't do this before you have data on what is queries in particular are slow. Often you can simply speed up the query by carefully altering indexes rather than denormalising.

Finally, a word on stored procedures rather than direct queries. On a decent database, you can set security permissions on stored procedures independently of the underlying tables. This, by itself, is reason enough to consider using stored procedures extensively. With stored procedures, you build a tighter security model than is possible with direct SQL access.

Simon Johnson
+1  A: 

The book by Michael J. Hernandez Database Design for Mere Mortals is well written, and an easy read. It should answer all of your questions.

Hernandez also co-authored SQL Queries for Mere Mortals with John L. Viescas.

The books are about $60 a piece. I'm trying to find the CD for Queries for Mere Mortals because I lost mine. If anyone has a copy, let me know.

+2  A: 

I didn't find what I was looking for in this question, but this one has a bunch of recommendations for design patters in DB design

brian