views:

128

answers:

2

I'm trying to come up with a validation expression to prevent users from entering html or javascript tags into a comment box on a web page.

The following works fine for a single line of text:

^(?!.*(<|>)).*$

..but it won't allow any newline characters because of the dot(.). If I go with something like this:

^(?!.*(<|>))(.|\s)*$

it will allow multiple lines but the expression only matches '<' and '>' on the first line. I need it to match any line.

This works fine:

^[-_\s\d\w&quot;'\.,:;#/&\$\%\?!@\+\*\\(\)]{0,4000}$

but it's ugly and I'm concerned that it's going to break for some users because it's a multi-lingual application.

Any ideas? Thanks!

+1  A: 

Note that your RE prevents users from entering < and >, in any context. "2 > 1", for example. This is very undesirable.

Rather than trying to use regular expressions to match HTML (which they aren't well suited to do), simply escape < and > by transforming them to &lt; and &gt;. Alternatively, find a package for your language-of-choice that implements whitelisting to allow a limited subset of HTML, or that supports its own markup language (I hear markdown is nice).

As for "." not matching newline characters, some regexp implementations support a flag (usually "m" for "multi-line" and "s" for "single line"; the latter causes "." to match newlines) to control this behavior.

The first two are basically equivalent to /^[^<>]*$/, except this one works on multiline strings. Any reason why you didn't write the RE that way?

outis
The App's DAL already handles escaping any 'dangerous' characters but I'd rather do it in both places. I've also noticed in the past that the client side ASP.Net validators tend to choke on anything that looks like a tag so I'm trying to avoid that as well.
Remoh
I'm aware that what I've shown so far will prevent any use of '<' and '>' and I was planning to tackle that after I get the negation working. I'll check to see if there's a multi-line flag.
Remoh
A: 

So, I looked into it and there is a .Net 'SingleLine' option for regular expressions that causes "." to also match on the new line character. Unfortunately, this isn't available in the ASP.Net RegularExpressionValidator. As far as I can see, there's no way to make something like ^(?!.(<\w+>)).$ work on a multi-line textbox without doing server-side validation.

I took your advice and went the route of escaping the tags on the server side. This requires setting the validation page directive to 'false' but in this particular instance that isn't a big deal because the comment box is really the only thing to worry about.

Remoh