tags:

views:

146

answers:

6

curl downloads http://mysite.com/Lunacy%20Disc%202%20of%202%20(U)(Saturn).zip

but not

http://mysite.com/Lunacy Disc 2 of 2 (U)(Saturn).zip

Why is this the case?

Do I need to convert it to the first format ?

using the URL generated via urlencode($url) fails.

A: 

You need to urlencode to translate the spaces (in your example; there are other characters that require it) for transmission across the internet. The encoding ensures that the various communications protocols don't terminate or otherwise mangle the string while they're handling it.

DaveE
+1  A: 

To convert an URL to the "first format", you can use the PHP function urlencode.


Now, for the "why", the answer can probably be found in the RFC 1738 - Uniform Resource Locators (URL).

Quoting some paragraphs :

Octets must be encoded if they have no corresponding graphic
character within the US-ASCII coded character set, if the use of the
corresponding character is unsafe, or if the corresponding character
is reserved for some other interpretation within the particular URL
scheme.

No corresponding graphic US-ASCII:

URLs are written only with the graphic printable characters of the
US-ASCII coded character set. The octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
control characters; these must be encoded.

A space has the code %20 -- it's not in the range 00-1F, so it should be encoded for that reason... But, a bit later :

Unsafe:

   Characters can be unsafe for a number of reasons.  The space
   character is unsafe because significant spaces may disappear and
   insignificant spaces may be introduced when URLs are transcribed or
   typeset or subjected to the treatment of word-processing programs.

And here, you know why the space character has to be escaped/encoded too ;-)

Pascal MARTIN
curl fails with urlencoded string
kemp
A: 

http://mysite.com/Lunacy Disc 2 of 2 (U)(Saturn).zip

That is not a valid url. Accessing urls like this may work in your browser because most modern browsers will automatically encode the url for you if required. The curl library must not do this automatically.

Inspire
+1  A: 

urlencode() does indeed fail with curl, if your problem is just with spaces, you can manually substitute them

$url = str_replace(' ', '%20', $url);
kemp
+2  A: 

Two problems:

  1. urlencode will also encode the slashes on you. It's meant to encode query strings for use in urls, not full urls.
  2. urlencode encodes spaces as +. You need rawurlencode if you want spaces as %20.
R. Bemrose
A: 

Why? Because some characters has special meanings such as # (html anchor).

So all characters except alfanumeric ones are encoded regardless need to be encoded or not.

JCasso