views:

295

answers:

2

We've run into an odd argument where I work, and I may be wrong on this, so this is why I am asking.

Our software outputs a directory to an Apache server that replaces an underscore with a %5F in the name of the directory.

For instance if the name of the directory was listed as a string in our software it would be: "andy_test", but then when the software outputs the directory to the Apache server, it would become "andy%5Ftest". Unfortunately, when you access the url on the server it ends up becoming "andy%255Ftest".

Somehow this seems wrong to me, once again the progression is:

  1. andy_test <- (as a string in the software)
  2. andy%5Ftest <- (listed as a directory on the server)
  3. andy%255Ftest <- (must be used when calling the same directory as a URL on the server from a web browser.)

I'm assuming that "%5" is encoding for underscore, and that "%25" is encoding for "%".

Now it would seem to me that the way that the directory name should be listed on the server would be just plain andy_test and if you were using an encoded URI then maybe you would end up with the "andy%5Ftest" to access the directory on the apache server.

I asked the guys on the backend about it, and they said that they were just: "encoding anything that was not a letter or a number.

So I guess I'm a bit confused on this. Can you tell me who is right, and direct me to some information on why?

+1  A: 

There is double encoding happening in what you are showing. Two steps should be enough:

andy_test is both the string in the software and the actual name of the directory or script in the filesystem (the resource the web server accesses)

andy%5Ftest is andy_test URL encoded. This string should the browser use (it's not really needed in the underscore case, but may be in other cases).

andy%255ftest is just andy_test URL encoded twice, which makes no sense, there should be no need to. Just decide WHERE you will do the encoding. If you do it both at the code level and at the webserver level this is what can happen and the result is broken links unless you are decoding two times again, which is not really needed nor sane.

Vinko Vrsalovic
I didn't write the backend for the software, I'm just trying to convince the guys from the backend that something has been done wrong.
leeand00
@leeand00: It should be obvious that having a thing done twice is wrong. The aim should be to determine where is the best place to do the encoding and do it only there (not twice).
Vinko Vrsalovic
+1  A: 

You should not encode the directory names as you create them (as you suggested). Encoding should only happen at the last stage where it is handed out to the browser. Thats why you are ending up with 'double' encoding: %25 is % and 5F is the leftover from the first encoding of underscore.

BTW, you don't need to encode underscore anyway (I think according to rfc1738).

2.2. URL Character Encoding Issues

...

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

Maxwell Troy Milton King
Thanks for the RFC reference as well!
leeand00