views:

52

answers:

2

I am using htmlpurifier library for sanitizing my incoming parameters. But it is not filtering null bytes (for e.g. %00). Am I missing something or the library does not support it? Will I be required to use a reg-ex? Thanks for any answers.

Edit:

I am using htmlpurifier with config options

$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', "UTF-8");
$config->set('Cache', 'SerializerPath', "/webdirs/htmlpurify");

For the test string

';</script><%00script>alert(845122)</script>

I get the output

';<%00script>alert(845122)
+1  A: 

As shown by HTMLPurifier/EncoderTest.php and HTMLPurifierTest.php, HTML Purifier does clean out null bytes:

    $this->assertPurification("Null byte\0", "Null byte");

and

    $this->assertCleanUTF8("null byte: \0", 'null byte: ');

Maybe you should post some code?

Edit: Your edit is slightly misleading; the actual output code is:

';&amp;lt;%00script&amp;gt;alert(845122)

which is just a string of plain text and perfectly safe. Percent-signs do not have special meaning in HTML.

If you would like to place a string in a URL, use urlencode().

Edward Z. Yang
@Ambush Commander - thanks for the reply. I have added some code, but not sure if it is enough. Let me know if you need any other details.
pinaki
the problem is that i cannot use html special chars as output from htmlpurifier. so i run a html_entity_decode on top of it. Now this value causes the issue. Is their any way to tell htmlpurifier to remove the script tag even when there is a %00 in between??
pinaki
Uh, come again? Why can't you use html special chars as output from HTML Puriifer? (running html_entity_decode is the WRONG WRONG way to do things, and assuredly leads to security vulnerabilities)
Edward Z. Yang
i cant use html special chars as output since the application i am working on has a session value which is getting encoded resulting in me logging of (when the value is encoded). I had a feeling that html_entity_decode is wrong, but can you give me any concrete example where it would affect? Thanks again for the time and explanation.
pinaki
<script> becomes <script> becomes <script> becomes <script>.
Edward Z. Yang
+1  A: 

It looks like HTML Purifier is filtering this string correctly, IF it appears within Javascript code.

In Javascript, you want to filter out any occurences of a closing tag, such as </script> even when it appears within a Javascript string literal. Otherwise, injecting </script> into a string value can bypass some non-careful filters and break out of the Javascript string and into arbitrary HTML. HTML Purifier seems to have correctly filtered this by removing that "tag".

There is no harm having <%00script> in a literal string within Javascript, IF that is indeed the context in which it appears.

Note also that %00 is not actually a null byte or PHP, or in HTML, or a Javascript script. It is a percent sign followed by two zeroes. However, in a URL %00 might indeed be interpreted as a null byte and therefore %00 should be filtered out of URLs.

thomasrutter
@thomasrutter - thanks for the reply. i understand that %00 needs oly to be filtered in URLs. Can you please add an example or link on how to add htmlpurifier on the javascript side? i have only been using it on the php side.
pinaki