tags:

views:

67

answers:

2

How do I URI escape Japanese characters in Perl?

+9  A: 

The URI::Escape module will be able to handle japanese characters, just like any other unsafe or special character un URIs.

Generally, when you're looking for some functionality in Perl, especially some that would seem as common as escaping things in URIs, you should consult http://search.cpan.org first. URI::Escape would probably have been at the very top of the results when searching for any of the keywords you used in your question.

rafl
+6  A: 

How do I URI escape Japanese characters in Perl?

You need to mention what encoding the Japanese characters are in.

If you are using UTF-8 and also using Perl's built-in Unicode encoding, then you can use this:

 use utf8;
 use URI::Escape qw/uri_escape_utf8/;
 my $escaped = uri_escape_utf8 ("チャオ");

If your Japanese characters are encoded using some format like EUC-JP, Shift-JIS, or other such things, you need to specify what kind of URI escaping you require. The standard things like

 my $escaped = uri_escape ("ハロー");

will give you something which is URI encoded but it isn't necessarily meaningful to the other end. For example if you are making a URL for WWWJDIC, the URI to look up 渮 is this for UTF-8:

http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1MMJ%E6%B8%AE

But for EUC-JP the same page looks like this:

http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1MKJ%DE%D1

Perl will do either one for you but you need to be specific about what your starting point is.

Kinopiko
RFC 3986 specifically recommends to encode to UTF-8 in URIs, no need to confuse the issue with the legacy encodings EUC-JP and Shift-JIS for new software.
daxim
@daxim: if the new software has to interact with the old software, there is such a need.
Kinopiko