views:

35

answers:

1

I do an HTTP GET request for a page using the following URL in Safari:
mysite.com/page.aspx?param=v%e5r
The page contains a form which posts back to itself. The HTML form tag looks like this when output by asp.net:
<form method="post" action="page.aspx?param=v%u00e5r" id="aspnetForm" >

When Safari POSTs this back it somehow converts this URL to:
page.aspx?param=v%25u00e5r, i.e. it URL encodes the already URL encoded string, which is then double encoded and the output generated by this parameter is garbled (v&#229;r). I am able to get around this some places by URL decoding the parameter before printing it.

Firefox and even IE8 handles this fine. Is this a bug in WebKit or am I doing something wrong?

To summarise:

  • GET mysite.com/page.aspx?param=v%e5r
    HTML: <form method="post" action="page.aspx?param=v%u00e5r" id="aspnetForm" >
  • POST mysite.com/page.aspx?param=v%25u00e5r
    HTML: <form method="post" action="page.aspx?param=v%25u00e5r" id="aspnetForm" >
  • +2  A: 
    mysite.com/page.aspx?param=v%e5r
    

    Whilst you can use encodings other than UTF-8 in the query part of a URL, it's inadvisable and will generally confuse a variety of scripts that assume UTF-8.

    You really want to be producing forms in pages marked as being UTF-8, then accepting UTF-8 in your application and encoding the string vår (assuming that's what you mean) as param=v%C3%A5r.

    page.aspx?param=v%u00e5r
    

    Oh dear! That's very much wrong. %uXXXX is a JavaScript-escape()-style sequence only; it is wholly invalid to put in a URL. Safari is presumably trying to fix up the mistake by encoding the % that isn't followed by a two-digit hex sequence with a %25.

    Is ASP.NET generating this? If so, that's highly disappointing. How are you creating the <form> tag? If you're encoding the parameter manually, maybe you need to specify an Encoding argument to HttpUtility.UrlEncode? ie. an Encoding.UTF8, or, if you really must have v%e5r, new Encoding(1252) (Windows code page 1252, Western European).

    bobince
    Thanks for the answer! I should have mentioned that our web application uses ISO8859-1 encoding. As far as I know %u00e5 is the ISO encoded string. The encoding is done using HttpContext.Server.UrlEncode(x); See http://stackoverflow.com/questions/3180691/broken-encoding-after-postback for an explanation of how the problem was fixed for other browsers than safari.
    Polymorphix
    This forum post seems to clarify a little bit (even though it's from 2004): http://www.velocityreviews.com/forums/t74587-query-string-encoding-decoding.html . It's main points are that 1: asp.net has a globalisation layer that interprets the request based primarily on the Accept-Charset header, secondly on the IIS globalisation setting. I see that Safari does not send the Accept-Charset header. Firefox and Chrome does. 2: ASP.Net uses a non standard way of encoding 16 bit unicode characters. I.e. the `%uXXXX` format.
    Polymorphix
    I tried stripping the Accept-Charset header from a Firefox request (using TamperData); no encoding errors. I also tried setting request and response encoding to ISO-8859-1 in IIS, which should have fixed the problem for Safari if the error was wrong encoding. However the problem seems to be that Safari does not understand the non standard ASP.Net URL encoding (`%uXXXX`). Thus it interprets the `%` as something to be encoded and we end up with `%25uXXXX`.
    Polymorphix
    That's not just non-standard, it's absolutely invalid to put anywhere in a URL. The statement in that thread that the `%uXXXX` format is ‘standardised by ISO-80646’ is an outright lie. Safari is right to try to fix what is an invalid URL, same as it would if you said eg. `param=100%`. The only correct serialisation for the ISO-8859-1 (or, more likely, Windows code page 1252) is `%E5`.
    bobince
    Judging by that thread, I am guessing the issue is that you're fetching a self-link from `Request.QueryString`. In ASP.NET this property is not the original query string, but one that ASP.NET has munged. Because IIS/ASP.NET, like many tools, wants only UTF-8 URLs, it won't accept `%E5` on its own, as that's not a valid UTF-8 sequence, so it resorts to the incorrect `%u` encoding. If you use the `rawUrl` property instead you would get the original `%E5` encoding. Better would be to create the new query string yourself, eg. `"page.aspx?param="+HttpUtility.UrlEncode(param, new Encoding(1252))"`.
    bobince
    In the longer term, consider moving your application to UTF-8, which would avoid this problem and many other little catches where tools expect to be using UTF-8. As well as allowing you to support any language. There's little reason today to use any other encoding.
    bobince
    I get a different kind of garbled string. Tried setting response encoding to "windows-1252". The links look the same as before, but the pages interpret them differently. Thanks for the feedback though!I'm thinking UTF-8 is the only solution. That's just not feasible right now...
    Polymorphix
    Yes, I'm aware of the reencoding done when getting Request.QueryString and the existance of Response.Url.OriginalString at least, which seems to be the similar/equal to RawUrl. Maybe I just need to start parsing manually...
    Polymorphix
    Actually creating the new query string ourselves is what we are doing with the UrlBuilder mentioned in the above referenced StackOverflow thread :) But I don't see any reason to use windows codepage 1252? We're using Context.Server.UrlEncode which should encode the url in what the browser expects for this page (ISO-8859-1).
    Polymorphix
    Every time you say “ISO-8859-1” on the web, browsers actually use Windows code page 1252 (Western European) instead, for sad historical reasons. They are similar encodings: only the bytes in the range 0x80–0x9F are different, mapped to extra characters instead of control codes.
    bobince
    Whilst I haven't tested it yet, I don't trust `Server.UrlEncode` at all. The doc says nothing about guessing the correct encoding; it is just supposed to forward to `HttpUtility.UrlEncode`, which won't. And unlike that method there is no variant that accepts an explicit `Encoding` object.
    bobince