views:

427

answers:

4

I'm working on a web application that allows users to type short descriptions of items in a catalog. I'm allowing Markdown in my textareas so users can do some HTML formatting.

My text sanitization function strips all tags from any inputted text before inserting it in the database:

public function sanitizeText($string, $allowedTags = "") {
 $string = strip_tags($string, $allowedTags);

 if(get_magic_quotes_gpc()) {
  return mysql_real_escape_string(stripslashes($string));
 } else {
  return mysql_real_escape_string($string);
 }
}

Essentially, all I'm storing in the database is Markdown--no other HTML, even "basic HTML" (like here at SO) is allowed.

Will allowing markdown present any security threats? Can markdown be XSSed, even though it has no tags?

+2  A: 

Will allowing markdown present any security threats? Can markdown be XSSed, even though it has no tags?

It's almost impossible to make absolute statements in that regard - who can say what the markdown parser can be tricked into with sufficiently malformed input?

However, the risk is probably very low, since it is a relatively simple syntax. The most obvious angle of attack would be javascript: URLs in links or images - probably not allowed by the parser, but it's something I'd check out.

Michael Borgwardt
A: 

BBcode provides more safety because you are generating the tags.

<img src="" onload="javascript:alert(\'haha\');"/>

If <img> is allowed, this will go straight through strip_tags ;) Bam !

peufeu
"More safety"? Duh??
bart
Markdown works the same way.
Michael Borgwardt
+7  A: 

I think stripping any HTML tag from the input will get you something pretty secure -- except if someone find a way to inject some really messed up data into Markdown, having it generate some even more messed-up output ^^

Still, here are two things that come to my mind :

First one : strip_tags is not a miracle function : it has some flaws...
For instance, it'll strip everything after the '<', in a situation like this one :

$str = "10 appels is <than 12 apples";
var_dump(strip_tags($str));

The output I get is :

string '10 appels is ' (length=13)

Which is not that nice for your users :-(


Second one : One day or another, you might want to allow some HTML tags/attributes ; or, even today, you might want to be sure that Markdown doesn't generate some HTML Tags/attributes.

You might be interested by something like HTMLPurifier : it allows you to specify which tags and attributes should be kept, and filters a string, so that only those remain.

It also generates valid HTML code -- which is always nice ;-)

Pascal MARTIN
@person-b : thanks for the edit ; you are of course right ^^
Pascal MARTIN
HTMLPurifier looks awesome.
Andrew
It kinda is, from I've heard/used ;-)
Pascal MARTIN
+1  A: 

Sanitizing the resulting HTML after rendering the Markdown is going to be safest. If you don't, I think that people would be able execute arbitrary Javascript in Markdown like so:

[Click me](javascript:alert\('Gotcha!'\);)

PHP Markdown converts this to:

<p><a href="javascript:alert&#40;'Gotcha!'&#41;;">Click me</a></p>

Which does the job. ...and don't even think about beginning to add in code to take care of these cases. Correct sanitization isn't easy, just use a good tool and apply it after you render your Markdown into HTML.

casey