views:

1034

answers:

14

I've been reading the book "Enterprise Rails" by Dan Chak and it got me thinking: Do you feel that you should have data constraints at both the database level and the application level? Or do you feel similarly to opinionated frameworks like Ruby on Rails - the database is just a "dumb repository" for data, and all checks should be done in your application (I'm not trying to single out RoR here - I'm a huge fan of Rails myself, but I disagree with its approach to the database)?

Personally, I feel that you should have them both in order to make sure your database and application are well-secured. What I mean is that you should make use of not-null constraints, give your fields a length if known ( as opposed to leaving them all at nvarchar(255) ), have things such as Foreign Keys, Check Constraints and Triggers on your database, and then also enforce this through business logic rules in your application. IMO this makes your application robust through its user interface, and also secure against someone who might have direct access to the database.

The counter-argument I most often see is that it requires what amounts to duplicate logic; once at the database level, and once at the application level -- Let's say you have a check constraint to verify that a product's SKU is entered (i.e. it's length is greater than zero).

You would now need to also include validation methods in your business logic to make sure the value the user entered has a length greater than zero, and also possibly some client-side Javascript to catch the error as the user types data.

I for one do not see this as a bad thing - yes you have some duplicate logic, but the end result is the "database as fortress" mindset, since if you think about it your data is the single most important part of your application; after all, what good is your shiny new Web 2.0 application if the data can easily be corrupted and compromised?

What are your thoughts on this? Should the database be an impenetrable fortress like Fort Knox, or an open safe that's guarded by lasers? In other words, should you sacrifice some duplication of logic to ensure a secure data model, or leave everything to your application and use the database simply to store data?

+10  A: 

If you're certain that you will never have another client application, you may get away with treating the database as a simple storage. However, if you'll ever have more than one client application obviously you will have to replicate constraints in all client applications, which is a bad idea. Remember that other clients include developer tools.

Also, by using the database as a "dumb repository", you will most likely end up with a less efficient application. The database can do a lot of stuff much more efficiently than your application can. Why not take advantage of that?

Brian Rasmussen
+6  A: 

I think you should try and protect your data at all costs. Nothing is worse then trying to do reporting on an application which had bad data because the application had a bug. Now what does this mean?

You should enforce your relationships through FK's, there is no reason not to. You should try and avoid null values and only use them when null values are appropiate. I do think there is a fine line however.

Should you parse a phone number to ensure it's in the proper format? Probably not, but then again you should probably store the phone number in a schema that doesn't have a formating issue.

JoshBerke
+12  A: 

Yes, if you want to restrict what goes in the database. The layers should be distinct from each other as much as possible and your database shouldn't rely on another layer ensuring that it follows the rules.

There's no guarantee that a buggy (or malicious) "business logic" layer will not insert toxic data into your tables. Of course, if you can trust the other layers, you probably won't need it. But I work in a mainframe shop where the DBAs are always having to fix problems caused by the young Java whippersnappers rolling out their buggy code to production without adequate (any?) testing :-).

Database tables that are shared between different development areas (and that's all of them for us) should always protect themselves from errant data. When App A puts dodgy data into the table used by App B, it's not the App A developers that take the heat, it's the DBAs.

paxdiablo
+6  A: 

If you follow the Jeff Atwood school of a database is just a dumb data storage & retrieval system then you would put all the validation in the application layer.

However, I find that applications are like small children. Unchecked they will throw everything around the room. It will be up to the parents to clean up the mess. In this case it will be the DBAs doing the cleaning.

However, I think you need to be careful about using every database data integrity feature, just because it is there. Overloading your database with foreign key constraints and triggers might create more problems than you think. I tend to use foreign keys only on tables which are very closely related, such as a header/detail table pair. If you start adding foreign keys everywhere you can end up with an unmagageable database.

I rarely use triggers. I think they make a database very opaque. You issue a simple update/insert/delete command and strange things might happen. I guess there are two places where triggers are unavoidable:

  1. When you don't have source code to the application writing to the database and you need to modify the behaviour. Triggers are your only option.

  2. If you are performing CRUD operations on a view. Triggers are mandatory for the insert/update/delete operations.

I tend to perform basic validation in the app. This way the user is given immediate feedback that something is wrong. Complex validation that requires looking up related tables is probably best done in the database (as well as the simple validation that the app does). I would argue that some forms of validation are almost impossible to guarantee at the application level, without using complicated locking strategies.

If you have multiple applications, possibly written in different languages on different platforms, then the case for putting more of the validation into the database layer is strengthened. The liklihood of two or more applications, written by different programmers, performing identical validation is fairly remote. Best do it in one place.

The Jeff Atwoods of this world would suggest that you write a web service that all the apps use to communicate with. The web service performs the data validation. Doing this allows the database to remain a dumb storage container, thus enabling you to switch database engines. In reality you rarely change database engines (unless you started out with Microsoft Access!). If you are writing web services purely to centralise your data validation then I thnk you are going overboard.

Mike Thompson
I think it's revealing that your expectations for your applications are explicitly low; if they are, then you have to deal with it. However, I don't think that should be your target.
le dorfier
I can't imagine how you would intentionally alter functionality with a trigger.
le dorfier
An insert trigger might update another table. If this is part of a bigger transaction it maybe difficult to determine which operation performed the update.
Mike Thompson
I was using the "apps are children" analogy to make a point. I may have slightly exaggerated the effect for emphasis. The point is that you can't 100% rely on applications to perform correct validation.
Mike Thompson
I agree with comment 2.
J3r3myK
A pre-insert trigger can do anything it wants including changing the data being inserted. Anyway, they're more like monkeys in the zoo than children - they'll throw sh*t at you when you're not looking :-).
paxdiablo
Jeff Atwood
Jeff, my apologies for using quotes. I was not meaning to quote you. I was merely referring your dislike of stored procedures. If you remove all the stored procedures from a database you are pretty much left with a basic CRUD storage repository.
Mike Thompson
That's it Jeff, you're marked with that quote for life now. I can hear Google sucking the text up right now :-).
paxdiablo
+3  A: 

Typically there is always some duplication, and databases aren't just dumb repositories.

db

The database ensures integrity at a data level. Foreign key constraints, non null constraints, and size constraints are all fundamentally covered by the database.

You can't do everything in the db, but you can do a lot. Protect the data.

business layer

Moving up a level, you have the business logic. Typically this is your point of integration with other applications (web service, your own UI, etc). Here business logic in encoded into the application. Things like if a product has an end date of x, then it can't also occur in y, if y has a different end date.

It would be hard to describe that sort of rule in the database, so you don't. But your business logic layer still intercepts things that it knows are invalid. e.g. if description field isn't allowed to be null, the business logic should never send it to the database. It'll error regardless, but you try to intercept things at the point they are known to be bad.

it's also hard to express in a database other 'rules' like 'new users have a 1 year expiry date if they are from Arkensas, 2 years otherwise, unless they have 3 children and one of them is named Barry'. We can laugh at this example, but a seasoned programmer will tell you that business logic is one of the biggest oxymorons around.

ui

Moving up to the UI, the UI also regularly encodes business logic in screens. Forms and other pages will oftentimes exist in an invalid state, and it's the job of the UI to know the rules at least most of the time. Hopefully the UI defers logic to the business layer, but the business layer doesn't have a clue that field 1 is the due date and field 2 is the description.

The UI knows to search products with X if the user has already selected widget Y. The UI knows that description is required, and that the item count is > 0 and < 100 (In these examples a good UI will rely on the business layer to tell it, for example, the min and max, but the UI is still aware of the relationship)

In web UI's we also add client script that again duplicates logic in server code. We use client script to have a better user experience, but in the end don't trust anything that comes from the client (script can be turned off, form fields manually manipulated, etc)


So you can see that logic will be duplicated. You try to reduce duplication as much as possible, but realistically it's rarely possible given a nontrivial program.

Robert Paulson
+1  A: 

It is better to enforce these constraints on database level, as they are more performant(implemented in C). It is ok to duplicate the validation on another layer as it provide more user-friendly error and validation messages.

It's not an XOR proposition, it is better to err on the safe side, enforcing these constraints on database would make your system more durable.

Michael Buen
+7  A: 

In short : database should enforce constraints.

Why :

  1. Easier. For ex. to have a constraint set on a particular data column there is only one place to set it : the column itself. The data might come from various sources but the check is put where the data is finally put to rest.
  2. Integrity. Database should be responsible for the data it hosts. An inconsistent database is as good as no database.
  3. Flexibility. New UI development environments come all too frequently. If database puts up its hand to say that it will take care of the constraints , the front end development and functional testing are easier.
Learning
+1  A: 

My personal preference is to enforce basic validation at the database layer and then use introspection techniques to bubble those constraints up to the application layer as defaults (convention over configuration). If I have some user interaction in a form or the like which is unusual then I'll override the defaults I got from the database with whatever new behavior I need. This helps me keep most basic validation primarily DRY-ly in the database layer, while I do more complex validation (i.e. phone number format) at the application layer.

I could do check constraints for things like phone numbers, but I find check constraints to be a difficult and frustrating way to enforce formats.

If I had an application that needed to receive input from a 2nd source (i.e. it has both a web interface and a desktop client) then either I would try and get the 2nd interface to work through the first (submitting desktop data via web service for example) or in failing that then I would fall back on using a check constraint in the database in addition to validation at the application layer.

Foremost I want to maintain the integrity of the data as much as possible, because bad data will inevitably cause havoc in the application - and then in addition (and actually equally important) I want the application to be able to handle validation of user supplied information.

Isaac Dealey
+1  A: 

From an OOP perspective, the database as in Object/Actor, within the larger system and it should be responsible for itself. That includes the necessary need to validate input.

Robert Gould
+2  A: 

If you do not enforce at least basic integrity in the database, at some point invalid data will get in. Maybe from an app. bug, maybe from someone pulling up the SQL console, whatever. And then you get to find out that your app has interesting failure modes when the impossible ("all B records must have an A record! what do you mean it doesn't exist?") happens.

Enforcing integrity in the app only works if the app is the only thing that will ever touch that database. Even so, you'd better be careful about app bugs, which are easier, as the app is much larger than the schema.

derobert
+1  A: 

Yes for both. I found this out at my last place. We had legacy delphi systems with sybase databases. The new system was .NET and Sql server. One particular employee was solely responsible for translating the sybase database to sql server database for customers that wanted upgrades to the new .NET system. He never worked with the .NET application code and therefore never saw data constraints at application level.

As a consequence he had to rely on information passed to him and the data constraints at database level. If the constraints were incorrect or missing at database level it led to bad data and support calls from customers. This happened more times than we would like because the data constraints were not always replicated from application level to database level.

Crippeoblade
+1  A: 

To offer a slightly different perspective, I think it depends on context.

First off, over the years I've moved from being 100% convinced of implementing constraints in the DBMS to trying to avoid it altogether: I want all my business logic on the same layer.

Secondly, I'm working a lot with Rails and ActiveRecord migrations don't allow for much db-resident constraint definition beyond field size and NULL_ness.

If you're building a new application which includes a new database that is entirely dedicated to your app, then you can enforce your constraints in your application code as part of your business logic. That's how Rails likes to work and it seems to work pretty well.

If you're building a new app against a legacy db that implements business rules in db constraints then I'd suggest continuing with that and accepting some duplication of validation rules in the business layer. Better to check that the data is valid before trying to apply it to the database.

If you're building a new application/database with the expectation that other apps will access the data, then the approach is going to depend on how those other apps will be built. Again, in Rails, you should probably be looking at ways to share your models, in which case that layer should suffice. If you can't deny other implementation styles from direct access to your data, then you're back to duplication. My preference would be to try - very hard - to deny direct db access to those apps and to strive to service them through a (hopefully RESTful) web service, so that you can manage data integrity at the business logic level.

If a third-party (internal or otherwise) application has DDL access to your schema, then stop worrying about the problem - you've already lost control of your data and you're screwed!

Mike Woodhouse
A: 

It depends.

If you are building a database that is to be embedded in a single app that you are also building, you can choose to put data validation in either the DBMS or the app or both. Almost all the databases being built by seasoned programmers who are database newbies fit this category. In this case, some of the other replies answer your question.

If however, you are building a database that is intended to store data received from multiple apps, and the programming of some of those apps is beyond your control, then you have to build the DBMS to be defensive, and not to allow wrong data from a broken app to infect the data you serve up to other apps and interactive users. You can't catch all the errors, but you can catch a lot of them.

At a minimum, you should design your tables so that they have at least one possible primary key (possible primary keys are called candidate keys). You should choose a primary key from the candidate keys, and declare it as a primary key constraint. This enforces entity integrity.

Also, you should declare a "references" constraint for every foreign key. This enforces referential integrity. With a decent DBMS, you should be able to enforce referential integrity even when the foreign key is optional, in other words might be NULL. NULLs of course don't refer to anything.

There is no point in moving these two kinds of validation to the app. The app has to make a round trip to the database anyway to detect violations of the rule.

The idea that business logic should not be in the database is, in my opinion, a misunderstanding of what databases are all about. Again, if your database is embedded in a single app, then suit yourself.

As to validation rules that forbid missing values (NULLS), there's no harm in implementing them in both the app and the database. In many cases, that's the right thing to do. Similarly with range checks.

For very large projects, you need a separate document to outline all the business rules on data. This document should state where the rules are enforced, in the database, in the app(s) or both.

Walter Mitty
A: 

I agree that the answer to this question depends on the environment.

In my current environment there are only two developers and less than a thousand users of our application. If you can ensure that your programming practices include requirements for implementing the business logic in the application layer than you can successfully handle the constraints outside the database.

If you have an application that will need to scale well and will eventually be maintained by a large pool of developers and utilized by an untold number of users potentially implementing data constraints in the database can improve efficiency as well as eliminate potential disaster from changes in the application layer.

Tinidian