views:

279

answers:

6

In a single-user desktop application that uses a database for storage, is it necessary to perform the data validation on the database, or is it ok to do it in code? What are the best practices, and if there are none, what are the advantages and disadvantages of each of the two possibilities?

+4  A: 

You should always validate in the code before the data reaches the database.

corymathews
In general, I'd prefer to leave the database out of it. The database is a repository for storing data, not just another place to add business logic (and remember to maintain it in two places when you change it).
DOK
The database should always validate the data. That's not business logic, it's encapsulation for robustness and security.
Jason Williams
@DOK: The "S" DBMS means system. It's an engine, not a repository. Use the engine to protect your data: what if someone is not using your app or another app to change data?
gbn
@gbn if someone else is editing your data then you have an entirely different problem, be it security or consistency with the other programs. Either way you should still be checking the data again when you read it back in.
corymathews
+8  A: 

Best practice is both. The database should be responsible for ensuring its own state is valid, and the program should ensure that it doesn't pass rubbish to the database.

The disadvantage is that you have to write more code, and you have a marginal extra runtime overhead - neither of which are usually particularly good reasons not to do it.

The advantage is that the database ensures low-level validity, but the program can help the user to enter valid data much better than by just passing back errors from the database - it can intervene earlier and provide UI hints (e.g. colouring invalid text fields red until they have been completed correctly, etc)

-- edit (more info promoted from comments) --

The smart approach in many cases is to write a data driven validator at each end and use a shared data file (e.g. XML) to drive the validations. If the spec for a validation changes, you only need to edit the description file and both ends of the validation will be updated in sync. (no code change).

Jason Williams
This may not be the best idea if you are the programmer though. Duplication of code means you have to go back to the front and back end for updates / changes.
JonH
Spec the design properly first, implement it once
Jason Williams
I agree. Both places are needed. On a database multiple applications could work, somebody could insert data using sql. And it has to be done from within the client application to ensure a good user experience.
nathanvda
Duplicate code almost always leads to bugs. I cannot even remember how many times I have validated in js and server side and forgot to change one or the other and the validation was in turn wrong. Get a huge app many developers and this will happen often.
corymathews
Applying the same verification in two places can lead to bugs, but that risk is much less dangerous than not validating the data being entered into the DB at all! It's almost always better to be over-cautious. Additionally, the verification passes are usually different (the DB verifies at a lower level - it seeks only to keep its internal state valid; the user interface seeks to apply business logic to improve the quality of the data. Lastly, shoddy coders will cock it up, but programmers working carefully from a clear spec will not have any problems.
Jason Williams
P.S. The smart approach in many cases is to write a data driven validator at each end and use a shared data file (e.g. XML) to drive the validations. If the spec for a validation changes, you only need to edit the description file and both ends of the validation will be updated in sync. (no code change).
Jason Williams
Correct: DBMS for integrity constraints and validation code in UI for the benefit of immediate feedback to users (a sound HCI reason). **But** I disagree on business logic: integrity constraints **are** business logic, even if one don't think of them in these terms; otherwise there is not much left to check. It is also a good idea, as suggested above, to have a tool that automate code generation and keep in sync both parts. Why not to add this suggestion in the answer?
MaD70
@MaD70: What I meant by business logic is that you may not always apply the same checks at each end. You could validate a quantity field at the DB as a positive nonzero integer (because the DB only cares about its (low level) internal integrity). But in the application you might limit it to be between 1 and the number of items in stock (because the business logic states that you can't sell more than you currently have). Of course, you may want the DB to verify the available stock too if that is needed for internal consistency, in which case it would include business logic.
Jason Williams
I'd think best practice would mean not to let any invalid data get to the DB. The lowest point data should be actually scrubbed is in the data layer before it attempts to write to the DB.
nportelli
A DBMS is a **system** to manage data, **NOT** a dumb store. It cares of only and all integrity constraints declared to it: if underspecified it will check accordingly. Every integrity check on data belongs to DBMS because of interactive users (power users/administrators) and applications **can be wrong**, especially when the same DB is accessed by different users and programs (also written by different programmers). I agree that checking the data as soon as possible in applications is absolutely sound for HCI reasons (good error feedback) but this is not an excuse to left the DB unchecked.
MaD70
The fact that some (most?) SQL-based DBMS sucks at enforcing constraints are not an excuse either: search for or ask better products, **declarative** integrity constraints are one of the main reasons for a DBMS - without them with are back to a worse past (i.e. pages of procedural code in each application).
MaD70
+1  A: 

Wouldn't it be smart to check the data before you try to store it? Database connections and resources are expensive. Try to make sure you have some sort of logic to validate the data before shipping it off to the database. I've seen some people do it on the front end, others on the back end, others even both.

It may be a good idea to create an assembly or validation tier. Validate the data and then ship it over to db.

JonH
+2  A: 

Data lasts longer than applications. It hangs around for years and years. This is true even if your application doesn't handle data of interest to regulatory authorities or law enforcement agencies, but the range of data which interests those guys keeps increasing.

Also it is still more common for data to be shared between applications with an organisation (reporting, data warehouse, data hub, web services) or exchanged between organisations than it is for one application to share multiple databases. Such exchanges may involve other mechanisms for loading data as well as extracting data besides the front end application which notionally owns the schema.

So, if you only want to code your data validation rules once put them in the database. If you like belt'n'braces put them in the GUI as well.

APC
+5  A: 

You do both.

The best practice for data validation is to sanitize your program's inputs to the database. However, this does not excuse the database of having its own validations. Programming your validations in your code only accounts for deltas produced in your managed environment. It does not account for corrupted databases, administration error, and the remote/future possibility that your database will be used by more than one application, in which case the application-level data validation logic should be duplicated in this new application.

Your database should have its own validation routines. You needn't think of them as cleaning the incoming data as much as it is running sanity checks/constraints/assertions. At no time should a database have invalid data in it. That's the entire point of integrity constraints.

To summarize, you do both of:

  1. Sanitize and validate user inputs before they reach your data store.
  2. Equip your data store with constraints that reinforce your validations.
Mark Canlas
A: 

In the application please!

Its very difficult to translate sqlerror -12345 into a message that means anything to an enduser. In many cases your user may be long gone by the time the database gets hold of the data (e.g. I hit submit then go look to see how many down votes I got in stackoverflow today).

The first prioirity is to validate the data in the application before sending it to the database.

The second priority should be to validate/screen the data at the front end to prevent the user entering invalid data or at least warn them immediatly that the data is inccorrect.

The third priority (if the application is important enough and your budget is big enough) would be for the database itself to verify the correctness of any inserts and updates via constriants and triggers etc.

James Anderson