views:

55

answers:

3

Here's a tough one:

I'm working on a website where most of the content is in Japanese.

But consider this url:

http://www.stackoverflow.com/質問/日本語のURLはどうする

Most URL parsers, including stackoverflow's, don't know where to delimit URLs that contain Japanese. Not good.

I haven't checked with other browsers yet, but Google Chrome will display URL-decoded Japanese urls in Japanese -- and when you copy the url, it URL-encodes it:

http://www.stackoverflow.com/%E8%B3%AA%E5%95%8F/%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%81%AEURL%E3%81%AF%E3%81%A9%E3%81%86%E3%81%99%E3%82%8B

That's the right thing to do, but it's not exactly the most friendly. So, I thought, I'd have the content providers just type in short English slugs, like this:

http://www.stackoverflow.com/questions/what-to-do-with-japanese-urls

Awesome. But there's two problems with this:

  1. Most of the people providing the content speak a modicum of English --- but they speak more Engrish than English. So they're likely to write something like "what-do-to-with-japenese-ulrs".
  2. Google Japan might not be as interested in the English URL as it might be in the Japanese one.

Any thoughts on the best course of action? :D

A: 

Unfortunately HTTP doesn't have native unicode support. All unicode URL path and query params need to be encoded. You can use unicode encoding instead of % encoding if you want, but that will still not make the URL any more human readable.

Mike
A: 

Using romaji transliteration?

http://www.stackoverflow.com/questionu/nihonnoURLwadousuru :D

Rekin
Rei Miyasaka
Scratch that, Google still prioritizes actual Japanese over romaji.
Rei Miyasaka
+1  A: 

The big Japanese websites seem to be somewhat divided on this question. Japanese wikipedia and amazon.co.jp use encoded Japanese, for instance. But the English slugs seem to be the most common. Take a look at http://fujifilm.jp/ for an example.

I think your concern about search engines is pretty valid though, and in your position that might be enough to tip the balance in favor of using encoded Japanese for me. It would be a bit of a pain though- lots of opportunity for error there.

Also, as Mike points out, what stackoverflow is doing is pretty much the right thing, as the url needs to be encoded.

T Duncan Smith
We decided to go with Japanese URLs. It's a shame that the URLs will end up looking garbled for people who want to share our links via IMs and stuff, but I think the trade-off will be worth it considering search engine traffic.Thanks for the input!
Rei Miyasaka