views:

219

answers:

2

I'm trying to figure out how to create personalized urls for double-byte languages.

For example, this url from Amazon Japan has Japanese characters within the querystring (specifically, the path):

http://www.amazon.co.jp/風の谷のナウシカ-DVD-宮崎駿/dp/B00005R5J3/ref=sr_1_3?ie=UTF8&s=dvd&qid=1269891925&sr=8-3

What I would like to do is have:

http://www.mysite.com/風の谷のナウシカ

or even

http://www.mysite.com/index.php?name=風の谷のナウシカ

be able to properly decode the $GET[name] string.

I think I have tried all of the urldecode and utf8_decode possibilities, but I just get gibberish in response.

This all works fine in a form $_POST, but I need these urls to be emailable...

EDIT: Here is the code I'm using:

<p>Original: <?= $_GET[str]; ?>

<br>Decode: <?= urldecode($_GET[str]); ?>

<br>Decode querystring: <?= urldecode($_SERVER[QUERY_STRING]); ?>

<p>

<?
   while (list($var,$value) = each ($_SERVER)) {
      echo "$var => $value <br />";
   }
?>
A: 

Have you tried reading the GET value directly? As with $_SERVER['QUERY_STRING'] or equivalent? I'm pretty sure that the urldecode() function still has some issues, even though it's supposed to work with UTF-8 since version 5.0.

This page over at php.net has some useful comments, some specifically for Japanese cases.

Pestilence
+2  A: 

Got it!

I needed to make sure the header was reporting:

header ('Content-type: text/html; charset=utf-8');

Once I did that, the characters were interpreted properly.

I also found this, which is a very good resource:

http://www.phpwact.org/php/i18n/utf-8

Jeffrey Berthiaume