In a single-user desktop application that uses a database for storage, is it necessary to perform the data validation on the database, or is it ok to do it in code? What are the best practices, and if there are none, what are the advantages and disadvantages of each of the two possibilities?
views:
279answers:
6You should always validate in the code before the data reaches the database.
Best practice is both. The database should be responsible for ensuring its own state is valid, and the program should ensure that it doesn't pass rubbish to the database.
The disadvantage is that you have to write more code, and you have a marginal extra runtime overhead - neither of which are usually particularly good reasons not to do it.
The advantage is that the database ensures low-level validity, but the program can help the user to enter valid data much better than by just passing back errors from the database - it can intervene earlier and provide UI hints (e.g. colouring invalid text fields red until they have been completed correctly, etc)
-- edit (more info promoted from comments) --
The smart approach in many cases is to write a data driven validator at each end and use a shared data file (e.g. XML) to drive the validations. If the spec for a validation changes, you only need to edit the description file and both ends of the validation will be updated in sync. (no code change).
Wouldn't it be smart to check the data before you try to store it? Database connections and resources are expensive. Try to make sure you have some sort of logic to validate the data before shipping it off to the database. I've seen some people do it on the front end, others on the back end, others even both.
It may be a good idea to create an assembly or validation tier. Validate the data and then ship it over to db.
Data lasts longer than applications. It hangs around for years and years. This is true even if your application doesn't handle data of interest to regulatory authorities or law enforcement agencies, but the range of data which interests those guys keeps increasing.
Also it is still more common for data to be shared between applications with an organisation (reporting, data warehouse, data hub, web services) or exchanged between organisations than it is for one application to share multiple databases. Such exchanges may involve other mechanisms for loading data as well as extracting data besides the front end application which notionally owns the schema.
So, if you only want to code your data validation rules once put them in the database. If you like belt'n'braces put them in the GUI as well.
You do both.
The best practice for data validation is to sanitize your program's inputs to the database. However, this does not excuse the database of having its own validations. Programming your validations in your code only accounts for deltas produced in your managed environment. It does not account for corrupted databases, administration error, and the remote/future possibility that your database will be used by more than one application, in which case the application-level data validation logic should be duplicated in this new application.
Your database should have its own validation routines. You needn't think of them as cleaning the incoming data as much as it is running sanity checks/constraints/assertions. At no time should a database have invalid data in it. That's the entire point of integrity constraints.
To summarize, you do both of:
- Sanitize and validate user inputs before they reach your data store.
- Equip your data store with constraints that reinforce your validations.
In the application please!
Its very difficult to translate sqlerror -12345 into a message that means anything to an enduser. In many cases your user may be long gone by the time the database gets hold of the data (e.g. I hit submit then go look to see how many down votes I got in stackoverflow today).
The first prioirity is to validate the data in the application before sending it to the database.
The second priority should be to validate/screen the data at the front end to prevent the user entering invalid data or at least warn them immediatly that the data is inccorrect.
The third priority (if the application is important enough and your budget is big enough) would be for the database itself to verify the correctness of any inserts and updates via constriants and triggers etc.