views:

301

answers:

5

Hi all,

I'm trying to create a validation expression that checks the length of an input and allows text and punctuation marks (e.g. , ? ; : ! " £ $ % )

What I have come up with so far is "^\s*(\w\s*){1,2046}\s*$" but this won't allow any punctuation marks. To be honest I'm pretty sketchy in this area so any help would be greatly appreciated!

Thanks,

Steve

+1  A: 

This should do it:

^\s*([\w,\?;:!"£$%]\s*){1,2046}$

Note that this doesn't limit the length of the input at all, it only limits the number of non-white-space characters.

To limit the length, you can use a positive lookahead that only matches a specific length range:

^(?=.{1,2046}$)\s*([\w,\?;:!"£$%]\s*)+$

(The upper limit on the number of non-white-space characters is pointless if it's the same as the length. The + is short for {1,}, requiring at least one non-white-space character.)

Guffa
Ok thanks, I didn't know that. How can I limit the length of the whole input?
Steve McCall
@Steve: See my edit above.
Guffa
A: 

This regular expression should match all your characters and limit the input:

^\s*([\w\s\?\;\:\!\"£\$%]{1,2046})\s*$
Superfilin
No, it allows infinite amounts of whitespace at the start and the end of the string.
Svante
+1  A: 

If you're looking to allow text and punctuation what are you looking to exclude? Digits? \D will give you everything that isn't a digit

Dr.Dredel
That is a good point. I do need digits too. I guess I just need to make sure that there is no malicious code.
Steve McCall
Haha, quick! try to define malicious code! Maybe youre better off defining a certain type of code... if it is for html, just ban less than signs. That sort of thing.
Karl
Karl, defining a whitelist is generally more secure than a blacklist. If you forget something in a whitelist, your users complain about not being able to enter sensible input, which is quickly fixed. If you forget something in a blacklist, your users complain about your site being "pwned", which is not so quickly fixed.
Svante
+1  A: 

You may already know this, but: guarding against malicious input should be handled server side, not in form validation on the client side. Black hats won't bat an eye at bypassing your script.

I think with most popular web front end frameworks there is library code for scrubbing input. A short regex alone is fairly flimsy for guarding against a SQL injection attack.

Merlyn Morgan-Graham
+2  A: 
^[\w\s.,:;!?€¥£¢$-]{0,2048}$

^ -- Beginning of string/line
[] -- A character class
\w -- A word character
\s -- A space character
.,:;!?€¥£¢$- -- Punctuation and special characters
{} -- Number of repeats (min,max)
$ -- End of string/line

Svante
Thank you, this post is very helpful and explains it well.
Steve McCall
Note that you might have to adjust this to the escaping rules in your system.
Svante
Also note that checking the string length before checking each character would be more efficient, but as far as I remember, validation in ASP.NET is done by a function that expects a single regex as parameter (which I regard as bad design).
Svante