views:

1032

answers:

3

I have the following regular expression validator to detect whether an input string contains HTML/script tags and if so cause a vaidation error:

<asp:TextBox ID="txt" runat="server" />
    <asp:RegularExpressionValidator 
        ControlToValidate="txt" 
        runat="server"
        ID="regexVal"
        EnableClientScript="true"  Display="Dynamic"
        ErrorMessage="Invalid Content" 
        Text="!" 
        ValidationExpression=">(?:(?<t>[^<]*))" />

When I run the page hosting this markup I get a scipt error with the message "Syntax Error in Regular Expression". However when I take the same regex and run it using Regex class from System.Text.RegularExpressions everything works fine: Like so:

Regex r = new Regex(">(?:(?<t>[^<]*))");
r.IsMatch(@"<b>This should cause a validation error</b>");
r.IsMatch("this is fine");

What am I missing

UPDATE: The error seems to be happening in the following js function in WebResource.axd:

function RegularExpressionValidatorEvaluateIsValid(val) {
    var value = ValidatorGetValue(val.controltovalidate);
    if (ValidatorTrim(value).length == 0)
        return true;
    var rx = new RegExp(val.validationexpression); //this is the line causing the error
    var matches = rx.exec(value);
    return (matches != null && value == matches[0]);
}
+1  A: 

I managed to find the root cause but not sure what exactly can be the resolution.

Using Firebug Console in FF3.5, run this to trigger all the client-side validator:

for(var _v=0; _v<Page_Validators.length; _v++){
    ValidatorValidate(Page_Validators[_v]);
}

then enter some text into the txt textbox and run the script again, an exception is thrown:
"invalid quantifier ?[^<]*))"

Somehow the regex string can't be parsed by the browser's regex engine. I haven't been able to find the alternative regex for it.

o.k.w
I'm wondering whether this is an ASP.NET bug in emitting out the javascript for the regex, as I mentioned in the Update, the following line is bombing:var rx = new RegExp(val.validationexpression);
Abhijeet Patel
I suggest changing to alternative regex or use only server-side validation. I have had regex compatibility issues with some browsers as well.
o.k.w
+5  A: 

I think the problem is that JavaScript does not understand .NET's regular expression syntax for grouping.

When you set EnableClientScript to true on the RegularExpressionValidator ASP.NET re-creates your regular expression in JavaScript to enable client-side validation on you controls. In this case, JavaScript doesn't support the syntax for named groups (?<t>...) and non-capturing groups (?:...). While these features work in .NET JavaScript is struggling with them.

From RegularExpressionValidator Control (General Reference) on MSDN :

On the client, JScript regular expression syntax is used. On the server, Regex syntax is used. Because JScript regular expression syntax is a subset of Regex syntax, it is recommended that you use JScript regular expression syntax in order to yield the same results on both the client and the server.

There are two ways you can correct this:

  1. Disable the client-side script generation and have the regular expression execue on the server-side. You can do this by setting EnableClientScript to false.
  2. Modify the regular expression and remove the non-capturing groups and named groups. If you need capturing in your regular expression, the (...) syntax should work correctly in both JavaScript and .NET. You would then use ordinal number references to access captured values ($1, $2, etc.). Something like >[^<]* should work as intended. See Grouping Constructs on MSDN.


I'd like to point out a couple of other issues:

  • You original regular expression doesn't seem to need capturing at all if all you want to do is check for the existence of an opening angle bracket. It could be rewritten as >[^<]* which will be simpler and work exactly the same way. It won't capture any values in the original string, but since you're using it in an ASP.NET validation control this shouldn't matter.
  • The way you're implementing the RegularExpressionValidator will only work if the match is successful. In your case, your validation will pass if your textbox contains something like >blah. I think you want it to work the other way around.
  • If you modify the regular expression to >[^<]*, the regular expression will still not work how I think you intend it to. The validation control tries to match all text in the textbox. So if I enter >blah in the textbox, it will match, but <b>blah</b> won't because the regular expression says that the string must start with a >. I would suggest trying something like .*>.*[^<]* to allow text before the >.
dariom
Good points you have there. +1
o.k.w
Thanks for the clarification. That makes a lot of sense now. I'd still like to get an equivalent regex that achieves the same end result as the original regex i.e detect html tags/content so that I can flag it as a validation error. Any ideas?
Abhijeet Patel
`[^<>]*` might be a starting point for your `RegularExpressionValidator`. It will try and match strings containing anything except angle brackets. Please note that parsing HTML with regular expressions is generally a bad idea: http://stackoverflow.com/questions/1816255/when-is-it-wise-to-use-regular-expressions-with-html (great links in that question - good to follow them!). In this instance, detecting HTML in input might be OK though...
dariom
Wouldn't this also detect less than and greater than symbols in general as matches such as "price must be >40 and <100"?In the use case I'm dealing with such inputs are considered valid. Only inputs such as "<script>......</script>" or "<b>I'm bold</b>" and the like are considered invalid.
Abhijeet Patel
How about something like this?^.*<\w+>.*$
Abhijeet Patel
Yes, my suggested regular expression would block "price < 10", etc. I did say it was just a start :-) What you're aiming for is possible, but tricky because of the way `RegularExpressionValidator` works (you have to invert the logic of the regular expression - i.e. you specify which patterns are allowed - not disallowed). Your expression `^.*<\w+>.*$` wouldn't prevent something like "<my tag>" being entered. This question deals with the error in your `RegularExpressionValidator` control. I'd recommend a new question to find an appropriate regular expression pattern to do what you want
dariom
Fair enough. Your answer is the best one. Thanks for all your help
Abhijeet Patel