views:

168

answers:

2

Something I still don't understand when perfoming an http-get request to the server is what the advantage is in using JS fucntion encodeURIcomponent to encode each component of the http-get.

Doing some tests I saw the server (using PHP) gets the values of the http-get request properly also if I don't use encodeURIcomponent!!! Obviuosly I still need to encode at client level the special character & ? = / : otherwise an http-get value like this "peace&love=virtue" would be considered as new key value pair of the http-get request instead of a one single value. But why does encodeURIcompenent encodes also many other charcaters like 'è' for example wich is translated into %C3%A8 that must be decoded on a PHP server using the utf8_decode function.

By using encodeURIcomponent all values of the http-get request are utf8 encoded, therefor when getting them in PHP I have to call each time the utf8_decode function on each $_GET value which is quite annoying.

Why can't we just encode only the & ? = / : charcaters???

see also: http://stackoverflow.com/questions/2607946/js-encodeuricomponent-result-different-from-the-one-created-by-form It shows that encodeURIComponent does not even encode properly because a simple browser FORM GET encodes charactrs like '€', in different way. So I still wonder what does this encodeURIComponent is for?

+6  A: 

That is because

A Uniform Resource Identifier (URI) is defined in [RFC3986] as a sequence of characters chosen from a limited subset of the repertoire of US-ASCII [ASCII] characters.

So officially unicode is not supported; see the RFC for details. All modern browsers support it though, and that is why you get your results just fine.. but for the odd case where some browser or system that does not support it you encode it and make sure it works fine across all standard compliant browsers..

Gaby
Marco Demajo
It would work in the majority of browsers.. but keep in mind that urls are also posted inside forums/blogs/etc.. and if that forum/blog/etc is not in unidoce the url inside (supposedly linking to your site) could get messed up ..
Gaby
+2  A: 

This is a character encoding issue (again). As Gaby stated, URIs are a sequence of ASCII characters (thus only bytes of the range 0–127). So any other character, that is not in ASCII, needs to be encoded with the Percent-Encoding.

And since UTF-8 is the new “universal character encoding”, nowadays user agents interpret the URI to be UTF-8 encoded. But these UTF-8 encoded words are themselves also encoded with the Percent-Encoding since URIs cannot contain any other characters except those in ASCII.

That means, when you enter http://en.wikipedia.org/wiki/€ into your browser’s address field, your browser looks up the UTF-8 code for (0xE282AC) and applies the Percent-Encoding on it (%E2%82%AC). So http://en.wikipedia.org/wiki/€ will actually result in http://en.wikipedia.org/wiki/%E2%82%AC.

To show you that this is true, just enter http://en.wikipedia.org/wiki/%E2%82%AC into your address field and your browser will probably turn that into http://en.wikipedia.org/wiki/€. That is because nowadays user agents interpret the URI to be UTF-8 encoded.

Now back to your initial question, why you should apply the Percent-Encoding explicitly: Imagine you have a web page where you want to link to the Wikipedia article on the Euro sign. If you just write the URI with a plain :

<a href="http://en.wikipedia.org/wiki/€"&gt;Euro sign</a>

Your browser will use the character encoding of the document for the character. That means, if your document’s encoding is Windows-1252 (as in your other question), the will be encoded as 0x80 and the URI would be http://en.wikipedia.org/wiki/%80 (this actually works because Wikipedia is that clever to guess as Windows-1252 is the most popular character encoding with a printable character on 0x80).

But if your document’s encoding is ISO 8859-15, the will be encoded as 0xA4 that represents the currency sign ¤ in ISO 8859-1 (Wikipedia will chose ISO 8859-1 because 0xA4 is an invalid byte sequence in UTF-8 and HTTP specifies ISO 8859-1 as default character encoding).

So I recommend to always use the Percent-Encoding to avoid mistakes. Don’t let the user agents guess what you mean.

Gumbo