tags:

views:

223

answers:

2

I use a xsl tranform to convert a xml file to html in dotNet. I transform the node values in the xml to html tag contents and attributes.

I compose the xml by using .Net DOM manipulation, setting the InnerText property of the nodes with the arbitrary and possibly malicious text. Right now, maliciously crafted input strings will make my html unsafe. Unsafe in the sense that some javascript might come from the the user and find its way to a link href attribute in the output html, for example.

The question is simple, what is the sanitizing, if any, that I have to do with my text before assigning it to the InnerText property? I thought that assigning to InnerText instead of InnerXml would do all the needed sanitization of the text, but that seems to not be the case.

Does my transform have to have any special characteristics to make this work safely? Any .net specific caveats that I should be aware?

Thanks!

A: 

You should sanitize your XML before transforming it with XSLT. You probably will need something like:

string encoded = HttpUtility.HtmlEncode("<script>alert('hi')</script>");
XmlElement node = xml.CreateElement("code");
node.InnerText = encoded;

Console.WriteLine(encoded);
Console.WriteLine(node.OuterXml);

With this, you'll get

&lt;script&gt;alert('hi')&lt;/script&gt;

When you add this text into your node, you'll get

<code>&amp;lt;script&amp;gt;alert('hi')&amp;lt;/script&amp;gt;</code>

Now, if you run your XSLT, this encoded HTML will not cause any problems in your output.

Rubens Farias
Wouldn't you get the same thing if you simply set the node value, instead of its inner text? `HtmlEncode()` seems redundant to me.
Tomalak
@Tomalak, main idea is to encode text prior adding it (you'll encode it twice); to add, you can use .InnerText (as OP said) or .Value
Rubens Farias
Hum... Why is there a need to encode twice, once in HtmlEncode and once in setting InnerText?
David Reis
Because you crafted some XML and it, inside XSLT, became HTML; encoding it ONCE (another encoding is done by XmlDocument) will do the trick
Rubens Farias
You basicly just repeated the answer without adding any information. I suspect the key is the part "it, inside XSLT, became HTML". Does this means that my malicioues text will be properly encoded in the xml file, but will be "decoded" when it's used as the contents of a html tag in the output, ergo I have to HTMLEncode it before writing it to the InnerText property?
David Reis
yes, that's important part; btw, can you please post your bad XML and XSLT code? Maybe I'm just talking nonsense, 'cause I bet on a misplaced disable-output-escaping
Rubens Farias
A: 

It turns out that the problem came from the xsl itself, wich used disable-output-escaping. Without that the Tranform itself will do all the encoding necessary.

If you must use disable-output-escaping, you have to use the appriate encodeinf function for each element. HtmlEncode for tag contents, HtmlAttributeEncode for attribute values and UrlEncode for html attribute values (e.g href)

David Reis