views:

101

answers:

8

I went to my bank website the other day and entered my account number with a trailing space. An error message popped that said, "Account number must consist of numeric values only." I thought to myself, "Seriously?! You couldn't have just stripped the space for me?". If I were any less of a computer geek, I may even have thought, "What? There are only numbers in there!" (not being able to see space).

The Calculator that comes with Ubuntu on the other hand merrily accepts spaces and commas, but oddly doesn't like trailing dots (without any ensuing digits).

So, that begs the question. Exactly how forgiving should web forms be? I don't think trimming whitespace is too much to ask, but what about other integer fields?

  • Should they allow +/- signs?
  • How many spaces should be allowed between the sign and the number?
  • What about commas for thousands separators?
  • What about in other parts of the world where use dots instead?
  • What if they're in between every 4 digits instead of every 3?
  • What about hexidecimal and octal representations?
  • Scientific notation?
  • What if I accidentally hit the quote button when I'm trying to hit enter, should that be stripped too?

It would be very easy for me to strip out all non-digit characters, and that would be extremely forgiving, but what if the user made an actual mistake that affects the input and should have been caught, but now I've just stripped it out?

What about things like phone numbers (which have a huge variety of formats), postal codes, zip codes, credit card numbers, usernames, emails, URLs (should I assume http? What about .com while I'm at it?)?

Where do you draw the line?

+7  A: 

For something as important as banking, I don't mind it complaining about my input, especially if the other option is mistakenly transferring a bucketload of money into some stranger's account instead of my wife's (because of a missing or incorrect digit for example).

A classic example is one of my banks which disallows monetary values unless they have ".99" at the end (where 9 can be any digit of course). The vast majority of things I do are for exact dollar amounts and it's a tad annoying to have to always enter 500.00 instead of just 500.

But I'll be happier about that the first time I avoid accidentally paying somebody $5072 instead of $50.72 just because I forgot the decimal point. Actually, that's pretty unlikely since it also asks for confirmation and I'm pretty anal in controlling my money :-)

Having said that, the general rule I try to follow is "be liberal in what you accept, be strict in what you produce".

This allows other software using my output to expect a limited range of possibilities (making their lives easier). But it makes my software more useful if it can handle simple misteaks.

paxdiablo
You story doesn't coincide with your conclusion ;) I don't think there's much debate in being strict about what you produce, but that is a good example in where you should be less liberal about what you accept.
Mark
+2  A: 

Numeric input:
Stripping non-digits seems reasonable to me, but the problem is conflicting decimal notation. Some regions expect , (comma) to denote the decimal separator, while others use . (period). Unless the input would likely be in other bases, I would only assume base 10. If it's reasonable to assume non-base 10 input (base-16 for color input, for example), I would go with standard conventions for denoting the bases: leading 0 means base 8, leading 0x means base 16.

String input:
This gets a lot more complicated. It mostly depends on what the input is actually meant to represent. A username should exclude characters that will cause trouble, but the meaning of 'cause trouble' will vary depending on the use of the application and the system itself. URLs have a concrete definition of what qualifies, but that definition is rather broad. Fortunately, many languages come with tools to discern URLs, without you having to code your own parsing (whether the language does it perfectly or not is another question).

In the end, it's really a case-by-case basis. I do like paxadiablo's general rule, though: Accept as much as you can, output only what you must.

Brian S
URLs have a well defined format, yes, but there are various algorithms you can use the coerce them into that format (like adding http:// at the beginning if it's not already present). Everything else sounds reasonable :)
Mark
Ah, good point! If no protocol is included, I would assume HTTP, and if no TLD is included, I would assume .com. If information is omitted, you have to supply your own, yes? Supply what's most likely. Of course, this comes with the stipulation that some applications may be more likely to use other formats; I can imagine a URL entry field in an application where FTP is more likely -- a FTP client, for example!
Brian S
"http" is a reasonable assumption, but ".com" is a little more debatable, IMO. What if the user is a goof and didn't read the label right, so he tried entering his username in the box instead, and now we've just accepted it as a valid URL even though it looks nothing like one?
Mark
Again, it's all context. Maybe it doesn't matter -- "my homepage" field in a user profile doesn't really matter much where the link goes to, but someone visiting the user's profile page expects it to be a hyperlink. Of course, if the user entered their desired name into a URL field by mistake, I suspect the form wouldn't be valid, since they probably didn't enter a username in the _correct_ field.
Brian S
+4  A: 

You draw the line at the point where the computer is guessing at what the correct input should be.

For example, a license key input box I wrote once accepts spaces and dashes and both upper and lower case, even though internally the keys were without said spaces, dashes and were all upper case. I could do that, since I knew that none of the keys actually had spaces or dashes.

Your example about URLs is another good one. I've noticed that modern browsers (I'm using Chrome), when something like 'flowers' is typed into the address bar, it knows it should search for it since it's not a valid URL. If instead, I type 'st' it auto corrects (or auto-suggests) 'stackoverflow.com' since it's a bookmark.

A well-written input system will complain when it would otherwise be forced to guess what the correct input should be.

Charlie Salts
I like that. Perhaps we can paraphrase it to "when the computer can't definitively determine what input was intended". There are still shades of gray however; where the user *most likely* meant one thing, but it's possible he meant another. Assuming you could assign percentages to these probabilities, where do you draw the line? If there's a 90% chance the user meant X, should you reformat it to X for him? 75%? 50%? Or does that depend on the consequences of an incorrect input?
Mark
Ooh, I like that. "the consequences of an incorrect input" is a very good metric on where to draw the line between a guess and invalidation.
Brian S
Banking: no guessing. Google search: guessing encouraged. As many others have said, it all depends on the intended usage of the data being entered. Google is great in that I can misspell things and it usually correctly guesses what I meant to search for. If it's wrong, it still provides a 'search for `xxxxxxx` instead' link.
Charlie Salts
I guess for my particular application the consequences aren't too dire -- most things can be retracted, or clarified in person. However, it makes it hard to design a general purpose framework. Perhaps in that context I could codify rules that are "definitely safe to assume" and then somehow let developers "increase liberalness" to just before the point of "it is never safe assume this". You'd hope that in the field of computer science, where things are built on math and algorithms, there would be fewer shades of gray, but no.
Mark
+1  A: 

It totally depends on how the data is going to be used.

If the input is a monetary amount, for a transaction for example, then the inputted variable should be normalised to a set of standards for sure.

If it's simply a case of a phone number, then it is unlikely the stored data will provide any functional sort of use so you can be more forgiving.

There is nothing wrong with forcing correct format to make displayed look nicer, but you have to balance user irritation with micro benefits.

Once you start collecting data you can scan through it and see what sort of patterns emerge, and you can auto strip off inputted format.

Tom Gullen
Assuming you allow the malformed data into your DB in the first place. I've been pretty strict about what actually makes it into the DB. Makes it a lot easier to search for things if they're all in the same format.
Mark
A: 

I think you're over reacting a little bit. If there's anything in the field that shouldn't be there, strip it. otherwise try to force the input into whatever format you want, and if it doesn't fit, reject it.

jordanstephens
What if the user is trying to type a number, say "678" but his finger slips and he types "6y8". I could strip the "y" since it isn't a number, and get a valid input of "68" but that isn't what the user intended. On the other hand, if the user is trying to type a dollar amount, but I don't want the $ sign in the input. I can safely remove *that* character because it's pretty obvious what he meant. "Strip everything bad" might not give enough feedback to the user, so I don't think it's a one-size-fits-all solution.
Mark
@Mark: unless I was trying to enter 478 but accidentally held down the shift key on the 4. That would give me $78 which you would turn into 78 :-) "don't think it's a one-size-fits-all solution" is a good piece of advice.
paxdiablo
@pax: True... but at some point in time we have to acknowledge that the user simply can't type and we're never going to get any sensible information out of him ;)
Mark
@Mark, then just reject it. there isn't a magical solution to this. you have to compromise somewhere. draw a line and stick to it. and try to give as much feedback as possible. you could always add a "did you mean?" dialog after form submission for clarity.
jordanstephens
A: 

I would say "Accept anything but process only valid data".

Expect your users to behave like a computer noob. Validate the input data using regular expressions and other validators.

Search for standard regular expressions for urls, emails and stuff.

Throw in a regular exp like this "/(?:([a-zA-Z0-9][\s,]+))([a-zA-Z0-9]+)$/" for comma or space separated values. With minor tweaking this exp will work for any number of comma separated values.

Jagira
I'm not quite sure I understand your statement. If I accept invalid data, what am I supposed to do with it, if not process it?
Mark
Ask for valid data again...
Jagira
+1  A: 

Where do you draw the line?

When the consequences of accepting "invalid" data outweigh the irritation of not accepting it.

Should they allow +/- signs?

If negative values are valid, then of course they should.

If not, then don't just silently strip minus signs, as it totally changes the meaning of the data. Stripping pluses is less of a problem.

What if [thousands separators are] in between every 4 digits instead of every 3?

In countries that use three-digit grouping, "1,0000" can be assumed to be a typo. But is it a typo for "10,000" or for "1,000"? I wouldn't dare guess, as a wrong guess could cost the user $9,000.

What about hexidecimal and octal representations?

Unless you're running the search feature for unicode.org, I can't imagine why anyone would use hexidecimal in web form.

And "01234" is almost certainly intended to be 1234 instead of 668.

What about things like...credit card numbers

Please allow spaces or hyphens in credit card numbers. It's really annoying when I have to type an undelimited 16-digit number.

dan04
A: 

The one that irritates me as a user is credit card numbers, conventionally these appear as groups of 4 digits with spaces separating them but the odd webform will only accept a single string of digits with no spaces and no indication that this is the format it's seeking. Similarly telephone numbers, humans often use spaces to improve clarity, webforms sometimes accept the spaces and sometimes don't.

Ian Hopkinson