tags:

views:

3984

answers:

4

If i want to create an url using a variable i have two choices to encode the string. urlencode and rawurlencode.

What exactly are the differences and which is preferred?

A: 

urlencode: This differs from the » RFC 1738 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

Remus Rusanu
+7  A: 

rawurlencode follows RFC 1738 (see http://us2.php.net/manual/en/function.rawurlencode.php)

urlencode does not encode spaces as plus signs (as done in rawurlencode)(see http://us2.php.net/manual/en/function.urlencode.php)

You may also want to see the discussion at http://bytes.com/groups/php/5624-urlencode-vs-rawurlencode.

EDIT upon further reading, it seems that plus signs, and even subsequent question marks are permissable in query strings according to rfc 2396.

it seems, it will depend on your purpose. if interoperability is important then it seems rawurlencode is the way to go. if you expect more complexity in the querystring then go with urlencode.

Jonathan Fingland
So which is preffered?
Gary Willoughby
rawurlencode. go with the standard in this case. urlencode is only kept for legacy use
Jonathan Fingland
Great thanks, thats what i thought, i just wanted a second opinion before i start updating lots of code.
Gary Willoughby
it also seems I was incorrect in my initial analysis that urlencode was the legacy option. see my edit for more info
Jonathan Fingland
I think it's rawurlencode that does not encode spaces as plus signs but as %20s
BigName
A: 
echo rawurlencode('http://www.google.com/index.html?id=asd asd');

yields

http%3A%2F%2Fwww.google.com%2Findex.html%3Fid%3Dasd%20asd

while

echo urlencode('http://www.google.com/index.html?id=asd asd');

yields

http%3A%2F%2Fwww.google.com%2Findex.html%3Fid%3Dasd+asd

The difference being the asd%20asd vs asd+asd

urlencode differs from RFC 1738 by encoding spaces as + instead of %20

jitter
A: 

The difference is in the return values, i.e:

urlencode():

Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 1738 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

rawurlencode():

Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits. This is the encoding described in » RFC 1738 for protecting literal characters from being interpreted as special URL delimiters, and for protecting URLs from being mangled by transmission media with character conversions (like some email systems).

The two are very similar, but the latter (rawurlencode) will replace spaces with a '%' and two hex digits, which is suitable for encoding passwords or such, where a '+' is not e.g.:

echo '<a href="ftp://user:', rawurlencode('foo @+%/'),
     '@ftp.example.com/x.txt">';
//Outputs <a href="ftp://user:foo%20%40%2B%25%[email protected]/x.txt">
karim79