views:

21

answers:

1

Can URL or FTP server addresses contain Japanese characters?

How about an FTP username and password?

A: 

Hostnames may contain any Unicode character using IDN (Punycode). So:

例え.テスト
xn--r8jz45g.xn--zckzah

are the same site.

Other parts of a URL are encoded using UTF-8 and normal URL-encoding. So:

http://例え.テスト/メインページ
http://xn--r8jz45g.xn--zckzah/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8

are the same address expressed as an IRI and a URI.

If you included a username:password in the URL that would also be encoded:

ftp://テスト:テスト@ftp.example.com/
ftp://%E3%83%86%E3%82%B9%E3%83%88:%E3%83%86%E3%82%B9%E3%83%[email protected]/

however whether this would actually work is another matter. The FTP RFC doesn't say anything about encodings (a later RFC specifies Unicode support for filenames, but this doesn't apply to passwords).

FTP servers are typically byte-based, so to make a password match you'd have to be sending the same encoding as the server accepts, which will typically be the system's default encoding. On modern Linux and OS X servers that'll be UTF-8; on Windows it'll be a locale-specific encoding which is never UTF-8. (On a Japanese Windows install it'll be code page 932, which is similar to shift-JIS.)

So, yeah, it could be done, but it's highly unreliable and best avoided. Then again, nasty insecure old FTP itself is best avoided these days.

bobince