views:

35

answers:

2

I'm using HTML Purifier, a PHP "filter that guards against XSS and ensures standards-compliant output," to sanitize/standardize user-inputted markup.

This is an example of the user-inputted markup:

<font face="'Times New Roman', Times">TEST</font>

which generates:

<span style="font-family:&quot;Times New Roman&quot;, Times;">TEST</span>

I'm a bit confused, because &quot isn't even the escape char for a single quote. What's the best practice here since I'm going to be using this user generated content later?

  • Leave as is
  • Replace all &quot with \' after purifier executes
  • Configure HTML Purifier differently
  • Something else?
+2  A: 

Looks okay to me.

I think the conversion from a single to a double quote comes from the fact that HTML purifier takes apart the entire tag, and puts it back together according to its own rules, which happen to use double quotes when quoting stuff inside a style attribute.

It also validates fine for me. What doctype are you validating against?

If I'm not overlooking something, I'd say this is fine to use as is.

Pekka
Great - if it looks good to you, then I'll use it! Thank you! Also, I took out the validation comment from my post...it validates fine in XHTML 1.0 Strict, which is the one I needed.
Emile
+1  A: 

The output is XHTML-valid but the entity conversion is wrong. <img src="/test" alt="I'm ok"/> would get converted to <img src="/test" alt="I&quot;m ok">

A simple will suffice:

$allowed_tags='<font>';
echo htmlspecialchars(strip_tags(rawurldecode($input),$allowed_tags),ENT_COMPAT,'UTF-8');

but it won't convert the <font> tag to <span>.

stillstanding
The entity conversion is not *wrong* as such: HTML Purifier deconstructs the whole thing and glues it back together with a new syntax. That syntax happens to use `"` instead of single quotes. I don't really see anything wrong with that.
Pekka
@stillstanding - what you said would totally make sense. But I just tried it and I get `<img src="hello.img" alt="I'm here" />` which means HTML Purifier must be sensitive to the attributes. But +1 for the use case...I didn't think of that and it was definitely worth testing. That solution should be good for some who gets the `alt="I" ok"`
Emile
@Pekka I think stillstanding was saying as that `"` would have been inappropriate in his example since a single quote would have been desired in the alt tag.
Emile
@Emile Yup, that would have been wrong to convert.
Pekka