I'm trying to write a regex function to remove onclick (also onload, onmouseover etc.) attributes from HTML elements. I want to do this on the server side before the HTML is sent to the client.
I have content coming from a Rich Text editor and being displayed on screen in a div, and I want to protect against XSS (Cross Site Scripting). Obviously I can't HTML encode it using Server.HtmlEncode() because the rich text stores the text as HTML markup, so I'm using a blacklisting approach, looking for certain elements such as <script>
and <style>
. I'm now trying to look for onclick, onmouseover etc. attributes, so far I have the following:
returnVal = Regex.Replace(returnVal, @"\<(.*?)(\ on[a-z]+\=\""?.*?\""?)*(.*?)\>",
"<$1 $3>", RegexOptions.Singleline | RegexOptions.IgnoreCase);
...which isn't working, and I've tried a few variations. Basically I want it so that...
<p style="font-style: italic" onclick="alert('hacked!!');">Hello World</p>
gets turned into...
<p style="font-style: italic">Hello World</p>
Any ideas? Cheers!