views:

282

answers:

3

I'm trying to create a Regex to block all < and > in a String except when used with <select>. Can anyone suggest a Regex for that? I'll be using it with javax.util.Pattern.

I'm trying to write a solution to block the injection attack and XSS attempts through request and URL. For that, I'll be blocking the special characters and character sequences but with some exceptions. One of the exception is that, I have to allow < select > (angle brackets with select in between them) because that is passed into the request legitimately in some of the cases. But all other combinations of angle brackets have to be blocked. And that is the reason of my question.

+1  A: 
finnw
Thanks for the response. I'm looking to compile it with Pattern.match(input String).find(). So, to me a single regex will suit better. Can you please advise? Thanks much
arya
My method is better suited to String.replace(x, y). See Jordan Liggitt's answer if it has to be a single regex
finnw
+4  A: 

This removes < and > characters from a string unless they are part of a <select> like you mentioned:

someString.replaceAll("<(?!select>)|(?<!\\<select)>", "");
Jordan Liggitt
Thanks Jordon for the answer. I was looking to compile it with the Pattern class. You answer gave a good pointer to the solution. Thanks much. Unfortunately, I can't increase the points as I myself need 15 for that!
arya
+2  A: 
Pattern p = Pattern.compile(
  "(?<!\\<select)>|<(?!\s*select\s*>)",
  Pattern.CASE_INSENSITIVE);

This will find > not preceded by <select and < not followed by select> allowing it to be case-insensitive.

Now normally I'd check for (legal) white-space around the element ("< select >" is valid) but the lookbehind has issues with that that I'm not really sure how to get around.

cletus
Thanks for the response. But it gives an exception as follows: java.util.regex.PatternSyntaxException: Look-behind group does not have an obvious maximum length near index 17(?<!\<\s*select\s*)>|<(?!\s*select\s*>)
arya
Corrected. Forgot about lookbehinds and wildcards.
cletus
Ok, Not a problem. Thanks much though, for giving a pointer. Now, if I have to write opposite of that, which means if I have to find any < or > which appear without <select>, will the following regex work? "^[<(?!select>)|(?<!\\<select)>]"
arya
You probably need to edit your question so we can figure out what you're trying to do and why. Replacing < and > for example is a strange requirement because replacing < and > with those is the usual defense against embedding tags. <img> won't be rendered by anything that I know of.
cletus
Sorry, Its the first time I posted a question here and it was not clear. That's the reason my score went to -2 :(. I'm editing the question now, to make it more clear. BTW, your solution works as it is (but without considering the whitespaces), Thanks much for that.
arya
Yeah well I think downvoting is a bit harsh because it is a programming question (and a valid one at that). It just needs some clarification.
cletus
@cletus: the DOTALL flag changes the behavior of the dot metacharacter, allowing it to match line separators as well as every other character. Since there are no dots in that regex, the DOTALL flag is redundant.
Alan Moore
@Alan: good point. It mattered in an earlier revision.
cletus
Thanks for your kindness guys. I added some details there.
arya