tags:

views:

26

answers:

1

Hi Folks,

I'm trying to implement some URI encoding of filenames in my urls, but am experiencing some strange problems with uri_escape and uri_escape_utf8, where it appears to be behaving inconsistently.

Using the perl command line:

richard@magic-box:$ perl
use URI::Escape;
print uri_escape_utf8("\"quotes\"_in_a_filename.pdf");
%22quotes%22_in_a_filename.pdf

Perfect, just what I want. Then in my code:

print STDERR uri_escape_utf8("\"quotes\"_in_a_filename.pdf");
print STDERR uri_escape("\"quotes\"_in_a_filename.pdf");

This results in my application log file getting the following lines:

"quotes"_in_a_filename.pdf
"quotes"_in_a_filename.pdf

Oddly, the same bit of code in the application works perfectly well with filenames with spaces, and (for example) correctly outputs:

my%20pdf%20with%20spaces.pdf

I am somewhat baffled, and don't know where to look next for solutions. Any help gratefully appreciated.

Cheers,

R

+5  A: 

The default set of unsafe characters has changed to those in RFC 3986 in version 1.53 of the URI distribution (see the Changes file. Unfortunately, it seems the list of default chars hasn't been updated in the documentation yet. The old set was:

^A-Za-z0-9\-_.!~*'()

and it now is

^A-Za-z0-9\-\._~"

which excludes the " from the unsafe characters. I assume your application is using a different perl interpreter, or at least a different library location for the URI::Escape module. There is a discussion about your exact issue in URI's bugtracker.

Edit: If you want full consistency, I'd advise you to declare your own escape function that passes the unsafe character pattern in explicitly.

phaylon
Thanks for this, much appreciated! I had toyed with the idea that URI::Escape would be different in the two different environments I was testing in, but wasn't sure how that would be possible, or why it would differ in this way. I'll create my own escape sequence.Thanks again, R
Richard J