I am building a control in .net 2.0 to allow users to write HTML into a textarea and then upload it. .NET won't allow them to upload it unless I set the page ValidateRequest=false. Of course, this opens up potential security threats. But, my plan is to uplaod the code, scan it for only the basic tags I would allow (like <B>
), possibly in its own zone. Does anyone see a problem with this? Would I be scanning too late in the process?
views:
102answers:
4This really depends on what you are doing with the code once you have uploaded it. As long as you are filtering it before you re-display or store the data, you should limit your exposure. Ideally you would be HTML Encoding the content, but this isn't an option if you need to actually render their HTML, which I'm guessing is the case.
NOTE I'm not saying that filtering is a trivial task, but your exposure vector varies, depending on what you are doing with the data, and who sees the information that is there. For example, allowing a user to input free form HTML input that is only stored and displayed for them, you have less risk of impacting other visitors, or larger portions of a site. However, if you are say working on a forum system, and a user posts mal-formed HTML or malicious HTML you could break the entire site for all users.
But keep in mind, if you are allowing for free-form HTML input, you have to additionally be sure to handle unclosed tags, and improper HTML markup.
I personally would recommend if you must accept HTML input from a user that you go with some form of RichTextEditor control. Something like the FCKEditor, the Telerik RadEditor, TinyMCE, or even the Markdown editor (used here on Stack Overflow). This helps lessen your exposure point from the malformed HTML side of things, not to mention a better experience for the users.
I'd take a look at the Sanitize HTML utility written by Jeff Atwood.
In short, Be Careful.
If you're going to let users upload HTML For display, you've got a lot of potential problems, most of which you probably won't see until too late.
Consider using a tool that allows you to give the users control over the display of their text without having to allow HTML. For example many forum websites use square brackets to indicate markup such as [b]some text[/b]. Better yet consider using something like Markdown (which is what this site uses, btw.).
Markdown wikipedia: http://en.wikipedia.org/wiki/Markdown
I would advise against it unless you only allow a few (marktup) tags with no attributes. So you should have a regex that matches
<b> <i> <u>.
If you're planning to support things like and and styles. Forget it. Either make sure you're an expert on that field, or hire an or use an existing library.
To get an idea of how hard it is, take a look at the xss cheat sheet by RSnake:
And note, this list is far from complete.