views:

2691

answers:

4

I am interacting with a web server using a desktop client program in C# and .Net 3.5. I am using Fiddler to see what traffic the web browser sends, and emulate that. Sadly this server is old, and is a bit confused about the notions of charsets and utf-8. Mostly it uses Latin-1.

When I enter data into the Web browser containing "special" chars, like "Ω π ℵ ∞ ♣ ♥ ♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓" fiddler show me that they are being transmitted as follows from browser to server: "♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓ "

But for my client, HttpUtility.HtmlEncode does not convert these characters, it leaves them as is. What do I need to call to convert "♈" to ♈ and so on?

+4  A: 

Rich Strahl just posted a blog post, Html and Uri String Encoding without System.Web, where he has some custom code that encodes the upper range of characters, too.

/// <summary>
/// HTML-encodes a string and returns the encoded string.
/// </summary>
/// <param name="text">The text string to encode. </param>
/// <returns>The HTML-encoded text.</returns>
public static string HtmlEncode(string text)
{
    if (text == null)
        return null;

    StringBuilder sb = new StringBuilder(text.Length);

    int len = text.Length;
    for (int i = 0; i < len; i++)
    {
        switch (text[i])
        {

            case '<':
                sb.Append("&lt;");
                break;
            case '>':
                sb.Append("&gt;");
                break;
            case '"':
                sb.Append("&quot;");
                break;
            case '&':
                sb.Append("&amp;");
                break;
            default:
                if (text[i] > 159)
                {
                    // decimal numeric entity
                    sb.Append("&#");
                    sb.Append(((int)text[i]).ToString(CultureInfo.InvariantCulture));
                    sb.Append(";");
                }
                else
                    sb.Append(text[i]);
                break;
        }
    }
    return sb.ToString();
}
bdukes
What reasons are there to HTML encode without System.Web?
AnthonyWJones
Why 159 for the cut-off?
Anthony
+6  A: 

The return value type of HtmlEncode is a string, which is of Unicode and hence has not need to encode these characters.

If the encoding of your output stream is not compatible with these characters then use HtmlEncode like this:-

 HttpUtility.HtmlEncode(outgoingString, Response.Output);

HtmlEncode with then escape the characters appropriately.

AnthonyWJones
Interesting, but how would you tie that up with Scott H's posting technique from http://www.hanselman.com/blog/PermaLink.aspx?guid=43e49ec8-1fa7-44c1-8177-42cd4fead8db
Anthony
@Anthony: They don't tie up at all (did you post the right link?). HtmlEncode has nothing to do with form POST emulations, or were you thinking of URLEncode stuff, thats a different thing.
AnthonyWJones
@ AnthonyWJones yes, it's the right link for the post technique. I have to encode this way before I post the form.
Anthony
@Anthony: ok. So the answer to your first question is as stated, They don't tie up. The encoding you need to post with is URL Encoding which is unrelated to HTML encoding.
AnthonyWJones
Thanks, but I'm pretty sure that for this particular server, I need HTML encoding before posting.
Anthony
@Anthony: It may well be for some strange reason the server is expecting HTML content in the entity body. However standard HTML forms will send data using the encoding that you would find in a url query string.
AnthonyWJones
+2  A: 

It seems horribly inefficient, but the only way I can think to do that is to look through each character:

public static string MyHtmlEncode(string value)
{
   // call the normal HtmlEncode first
   char[] chars = HttpUtility.HtmlEncode(value).ToCharArray();
   StringBuilder encodedValue = new StringBuilder();
   foreach(char c in chars)
   {
      if ((int)c > 127) // above normal ASCII
         encodedValue.Append("&#" + (int)c + ";");
      else
         encodedValue.Append(c);
   }
   return encodedValue.ToString();
}
Rick
This works. I haven't tested the others yet.
Anthony
A: 

It seems like HtmlEncode is just for encoding strings that are put into HTML documents, where only / < > & etc. cause problems. For URL's, just replace HtmlEncode with UrlEncode.

Matt