views:

502

answers:

8

Hi guys.

I developed a web application, that permits my users to manage some aspects of a web site dynamically (yes, some kind of cms) in LAMP environment (debian, apache, php, mysql)

Well, for example, they create a news in their private area on my server, then this is published on their website via a cURL request (or by ajax).

The news is created with an WYSIWYG editor (fck at moment, probably tinyMCE in the next future).

So, i can't disallow the html tags, but how can i be safe? What kind of tags i MUST delete (javascripts?)? That in meaning to be server-safe.. but how to be 'legally' safe? If an user use my application to make xss, can i be have some legal troubles?

+2  A: 

Rather than allow HTML, you should have some other markup that can be converted to HTML. Trying to strip out rogue HTML from user input is nearly impossible, for example

<scr<script>ipt etc="...">

Removing from this will leave

<script etc="...">
ck
Using a white list rather than a black list would solve this problem.
Gumbo
see the img tag answer in http://stackoverflow.com/questions/701580/how-can-i-allow-my-user-to-insert-html-code-without-risks-not-only-technical-r/701609#701609
ck
XSS is also possible through other markup languages, such as BBcode, so that doesn't really fix anything. A whitelist approach works pretty well.
troelskn
+6  A: 

The general best strategy here is to whitelist specific tags and attributes that you deem safe, and escape/remove everything else. For example, a sensible whitelist might be <p>, <ul>, <ol>, <li>, <strong>, <em>, <pre>, <code>, <blockquote>, <cite>. Alternatively, consider human-friendly markup like Textile or Markdown that can be easily converted into safe HTML.

John Feminella
Can´t you still insert scripts in the allowed tags using a white-list?
jeroen
That depends on how you're escaping them. If you're describing something like "<scr<script>ipt ...", I'd first note that "<scr" looks like the beginning of a tag. Since "scr" isn't whitelisted, we can escape it safely. Then we get to the "<script>" and it's also escaped/removed.
John Feminella
I was thinking more about the attributes, but I guess that depends if your white-list has any tags that need them, so you would have to allow them. If you allow attributes, you´d have to get rid of the whole onclick="", etc. range, but I guess that´s pretty obvious :)
jeroen
Oh, absolutely. You have to whitelist attributes separately, though, just like you whitelist each tag. (That's the price you pay for being explicit.)
John Feminella
+8  A: 

It doesn't really matter what you're looking to remove, someone will always find a way to get around it. As a reference take a look at this XSS Cheat Sheet.

As an example, how are you ever going to remove this valid XSS attack:

<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>

Your best option is only allow a subset of acceptable tags and remove anything else. This practice is know as White Listing and is the best method for preventing XSS (besides disallowing HTML.)

Also use the cheat sheet in your testing; fire as much as you can at your website and try to find some ways to perform XSS.

Gavin Miller
+1 for the cheat sheet
Frank Krueger
A: 

If it is too difficult removing the tags you could reject the whole html-data until the user enters a valid one. I would reject html if it contains the following tags:

frameset,frame,iframe,script,object,embed,applet.

Also tags which you want to disallow are: head (and sub-tags),body,html because you want to provide them by yourself and you do not want the user to manipulate your metadata.

But generally speaking, allowing the user to provide his own html code always imposes some security issues.

codymanix
+9  A: 

If you are using php, an excellent solution is to use HTMLPurifier. It has many options to filter out bad stuff, and as a side effect, guarantees well formed html output. I use it to view spam which can be a hostile environment.

DGM
I decided to take this way, plus some kind of personal steps.I must give the total freedom to my costumers to use html tags ('cos of the WYSIWYG editor), restricting only certain things.. i hope that keep it updated with the latest security doors wont be much problematic.
DaNieL
I trust it much more that I trust my own efforts....
DGM
A: 

You might want to consider, rather than allowing HTML at all, implementing some standin for HTML like BBCode or Markdown.

chaos
+2  A: 

For a C# example of white list approach, which stackoverflow uses, you can look at this page.

çağdaş
+1  A: 

Kohana's security helper is pretty good. From what I remember, it was taken from a different project.

However I tested out

<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>

From LFSR Consulting's answer, and it escaped it correctly.

alex