views:

496

answers:

6

Scott W. Ambler has put up a good list of basic database smells. It would be good to see how these can be corrected, what database refactorings can be applied as a remedy and if any smells are missing from the list.

For "ground rules" when replying, take a look at question on programming smells if you please.

+1  A: 

Remember that refactoring of a database can go both ways (normalizing and denormalizing)

Also check out this little article.

And this summary article.

Ólafur Waage
Ólafur, as I see it, a smell is as symptom of a design problem. It can as well be introduced consciously and as a tradeoff, that’s just one reason more to be aware of the cost involved. This goes for programming smells too, just take a look at DTO pattern: it introduces a significant amount of duplication to your code, but for a purpose (of sending the data over the wire in one go). Most of the smells are, of course, introduced unintentionally. No matter the motive or how it was introduced, the smell stays the smell. Nevertheless you are making a valid point.
Danijel Arsenovski
+4  A: 

In my experience

  • Lack of unique constraints
  • Lack of Foreign Key constraints
  • Incorrect data type (like varchar or number for datetime)
  • Columns with null values for most rows
  • Lack of check constraints
  • Lack of Normalization (in 80% case this is good to do)
  • Heck I have even seen DB with no index on primary key

Also if the DB is used by only one application and developed in a modern programming language (Java, C#, Ruby etc) then complex business rules in stored procs, to me, is a smell.

Pratik
You have just described the how the database was when I got to where I'm working now. So much fun was had. I designed a new one from scratch and migrated the data over.
Iuvat
+1 for the SP smell.
Blake Pettersson
@lubat: I envy you.
voyager
I have even seen DB with no primary keys :_(
Álvaro G. Vicario
I was going to give you an upvote until I got to the stored proc part. Stored procs are the best place in my opinion for business logic in a complex system and the application is the absolute worst place.
HLGEM
+1  A: 

I tend to find that a proliferation of "flags" is a smell. If your table has: IsThis, IsThat, and IsTheOther you may want to consider table inheritance to define the data in your model.

Often what starts leaking into the schema is other data that is only valid when a certain flag is set. So you end up with:

IsThis, ThisData, IsThat, ThatNumber, ThatCountry, IsTheOther, 
TheOtherAccount, TheOtherName, TheOtherZipCode, TheOtherPhone

And this is, IMO, what that initial "flag smell" tends to lead to in practice :-P

Joel Martinez
A: 

I find "Fear of change" a very good one. "If it ain't broke, don't fix it" has become a "database smell". "Feel free to change anything" is the new motto. It didn't work properly to boot anyhow.

"Lots of rows as an indicator for bad design" is my second favourite. Especially if you consider how important "scalability" is considered to be in modern IT.

"Solving the problem of "lots of rows" by vertical decomposition" also seems like a very interesting concept. Need to get myself informed on that one.

+5  A: 

I've done a couple of presentations at the MySQL Conference titled "SQL Antipatterns."

Here's a selection of the topics:

  • Obligatory ID pseudokey when none is required
  • Comma-separated lists in a VARCHAR
  • Splitting an attribute into multiple columns, e.g. salesJanuary, salesFebruary, salesMarch, etc.
  • Splitting a table into many tables per month, per product, per customer, etc.
  • Entity-Attribute-Value (EAV)
  • Polymorphic associations
  • ENUM, or using a CHECK constraint simply for a set of values
  • Using FLOAT for money
  • Using a special value to signify no value, instead of using NULL
  • "LIKE '%keyword%'" (a real full-text search engine would be better)
  • "SELECT *" or "INSERT INTO tablename VALUES (...)" (implicit columns)
  • SQL Injection (interpolating unvalidated application variables into SQL query strings)
Bill Karwin
+1  A: 

Lack of enforcement of data integrity at the database level is my number one. And even if the data integrity is enforced at the application level (shudder), then actually enforce it. I shouldn't be trying getting a group of email addresses to send an email to and have one of them be 'Talk to the secretary with the big boobs' to pick a not-random example.

The use of a fake data element to avoid having nulls is a big smell too. This creates far more problmes than it solves and makes your data a mess. And what happens when you need 0 to be a real value for that column instead of the NULL substitute. Null means unknown, that means is should never be anything except NULL. I do a lot of imports of data from other companies' databases and when we get this stuff, how do we know what the business rule is to mean this is really null data?

HLGEM