views:

6

answers:

0

I have a question about encoding special/extended UTF-8 characters in URLs in JavaScript. The same question applies to many characters like the Registered R-circle, but my example uses an umlaut:

ü = %C3%BC in UTF-8 (four rows from bottom of http://www.utf8-chartable.de/)

If the url contains an umlaut represented as UTF-8 (ü = %C3%BC), and I run it through encodeURIComponent, the %s are encode, the string now looks like "%25C3%25BC" and it gets correctly processed by my system. This is good.

url = "http://foo.com/bar.html?%C3%BC"
url = encodeURIComponent(url);
// url is now represented as "http%3A%2F%2Ffoo.com%2Fbar.html%3F%25C3%25BC"

However, the bad: If the pre-encoded string has an unencoded character, the actual umlaut, the after encoding is looks like "%C3%BC" and fails because, I believe, the %s should be encoded, too.:

url = "http://foo.com/bar.html?ü"
url = encodeURIComponent(url);
// url is now represented as "http%3A%2F%2Ffoo.com%2Fbar.html%3F%C3%BC"

I think it fails because it is less thoroughly encoded than the rest of the url.

So, beyond general advice or answers to questions I don't know to ask, what I think i want to know is how to get the raw umlaut (and all other special characters) to fully encode. Is that what is incorrect?

Thanks for your help! Nate