views:

69

answers:

3

Hi all,

I'm using ASP.Net 4.0 with MVC 2. I'm recieving user content that may or may not be Html Encoded already. I've read http://weblogs.asp.net/scottgu/archive/2010/04/06/new-lt-gt-syntax-for-html-encoding-output-in-asp-net-4-and-asp-net-mvc-2.aspx which was interesting but what I need is a way to ensure the content is encoded without double encoding. I don't have control of the input process.

E.g.

User Input:

&amp; &lt; < > &gt;  

Output if encoded:

&amp;amp; &amp;lt; &lt; &gt; &amp;gt;  

Won't display correctly

Output if not encoded:

&amp; &lt; < > &gt;

This won't validate correctly

A: 

If it were me, I'd replace only the < and > characters, leaving everything else intact.

zildjohn01
+2  A: 

You could make a first pass decoding user input, and then re-encode the result. This way, if some values of the input are already encoded, they will get decoded, and you'll be able to encode everything after.

&amp; &lt; < > &gt;  

-> decode the input and you get:

& < < > >

-> re-encode everything and you get:

&amp; &lt; &lt; &gt; &gt
Shimrod
This would fail on plain text content mentioning a HTML entity, as it would be decoded when this wasn't the intention.
Richard
You are right. I guess it depends of the kind of input he'll get.
Shimrod
A: 

I don't think you will find a solution which will work automatically both for content that is encoded and not - the only way I can see where you can do this reliably is to specify whether the content has been encoded or not. Otherwise, you will run into problems in certain situations, e.g.

Some plain text mentioning &gt; being the syntax for >

And

<p>Some HTML mentioning that &amp;amp; is the syntax for &gt;</p>

You can try detecting whether there is encoded content or HTML content present, but my examples above show that this will not always be infallible.

Richard