When working on applications that are heavy on user input, or even ones that are not; what are the biggest issues that you have encountered? What are the steps that you generally take to ensure that the input is real and accurate? What are some lessons that you have learned in regards to input validation that you wish you had known about before hand?
views:
452answers:
9It seems to me like the answer to that is pretty dependent on what kind of input you're validating.
Edit
Let me elaborate a little. As far as actual UI stuff goes, Matt's right now. Removing all of the barriers to entry is key.
But, as far as validation goes, that's such a broad question that it's hard to answer. Think of all of the different kinds of data that people commonly have to enter. Each on has it's own set of considerations.
Credit card data? One thing to worry about is the correct length which might be dependent on the type of card.
Dates? What format is the user entering the date in? Are the selecting values from a pull down or entering it free-form? Are they entering a date in the past or future? Does it matter?
Email addresses? I guess this one is pretty easy to deal with. Just throw Mail::RFC822::Address: into your code and be done with it.
The list goes on and on. I think you'll get more useful answer if you delve into more specific cases.
When dealing with experienced data-entry clerks, the biggest issues I've encountered have been around fast keyboard entry. Things like using the enter key to move between fields, and having the focus in the right place so there's no need to reach for the mouse.
In terms of validation ... the only real problem I've seen is when the users can't (or won't) give me a reasonable upper or lower limit for a numeric field. Without a sensible limit, it's easy for a user to accidentally hit a number twice and end up an order of magnitude out. Setting the maximum length on a TextBox is a simple way to stop this sort of thing from happening.
Considering strings and the recent flood of vulnerabilities regarding SQL injection and the like, here is my advise:
Only accept what you know is good.
If your program only expects upper case characters from A-Z then either reject any input containing any other character or silently ignore other characters.
I often see some of my colleagues do a test for "badness", that is checking against certain characters that they expect the user might type in and they check for these characters explicitly. That introduces very long and cumbersome switch/case blocks in places that only needed a single range check for ASCII characters.
First line of defence for me is normally som client site scripting that performs various regex validations. This will remove the majority of potential issues and give the user instant feedback informing them that their input is not valid. Also always remember to re-check the same inputs server side.
I general you should watch out for the following vulnerabilities:
- SQL Injection
- Cross Site Scripting (Jeff and Joel spoke about this in Podcast #11)
Jeff's post about the Sins of Software Security is also relevant.
Always assume the worst case. If you plan for everything you can think of you are better off.
If it can be broken, your users will break it, so protect agains all wacky stuff that people will try.
I agree with what has been said about maintaining strict standards on input data, but I'd like to add that in the case of user input, strict validation must go hand-in-hand with appropriate and specific feedback to the user. For instance, if you decide that the only email addresses your system will accept as valid are \w+@\w+.\w{3}, then you need to make that clear to your users, both before they enter data (as a tool tip or text next to the field) and after they enter invalid data (e.g.: "[email protected]" does not appear to be a valid email address. A valid address is of the form...).
I purposefully chose email addresses for the above example to point out another hazard of strict validation: only accepting a subset of truly valid data. Always keep that in mind when developing a validation strategy, and consider adding mechanisms for users to challenge the validity ruling. Such mechanisms should be low-friction - if a user has to pick up the phone and call somebody to report it, most won't bother - and your organization should have a method for getting the reports into the hands of the developers so they can refine their validations.
I find validation very important and generally use reg ex to archive this as unfortunately simple validation such as just text length often fails as users decide to press space four times to get around it (yes you could trim the text but this is not the point as other examples are not so easy).
In general I find on unformatted text (not email or phone numbers) very hard to validate aginst the determined user. Even when it may be expected in a certain format you will later find something that breaks the rule. Even when you have put in validation e.g. if you specify you must have a first line of an address there is sometimes nothing stopping the users entering xxxxx.
To answer the question I think the one think I have learnt is that it is key to explain to the user why it benefits them to enter correct details. This can be that on a website for the user to receive relevant marketing material or in an internal application so users can later get valid reports.
Fast keyboard entry can often be a goal as already mentioned, but on top of this I've often had to make sure they stop and think in the appropriate place as well, this is quite hard as users will find quick ways of doing things - I've been asked to remove keyboard shortcuts on some occasions to force them to pick up the mouse and select something, which data entry people really hate! but it breaks up their rhythm at a certain point to force them to check something.
Watching people actually use and abuse your forms is always valuable rather than just waiting for feedback from them.
As far as my experience, I have found it impossible to validate phone numbers and post codes for suburbs. Internationally there is no consistent format, and in some cases postal codes don't even exist.
I figure if the user wants to get contacted then they will put in correct data.
That is not to say don't check for sql injection and other security risks but to say a phone number must conform to a specific regex will not be true in all cases. Or an address to even have a postcode/zipcode is not always a given.