According to RFC 1738 Uniform Resource Locators (URL), only US-ASCII is supported, all other characters must be encoded.
2.2. URL Character Encoding Issues
URLs are sequences of characters, i.e., letters, digits, and special
characters. A URLs may be represented
in a variety of ways: e.g., ink on
paper, or a sequence of octets in a
coded character set. The
interpretation of a URL depends only
on the identity of the characters
used.
In most URL schemes, the sequences of characters in different parts of a
URL are used to represent sequences of
octets used in Internet protocols. For
example, in the ftp scheme, the host
name, directory name and file names
are such sequences of octets,
represented by parts of the URL.
Within those parts, an octet may be
represented by the chararacter which
has that octet as its code within the
US-ASCII [20] coded character set.
In addition, octets may be encoded by a character triplet consisting of
the character "%" followed by the two
hexadecimal digits (from
"0123456789ABCDEF") which forming the
hexadecimal value of the octet. (The
characters "abcdef" may also be used
in hexadecimal encodings.)
Octets must be encoded if they have no corresponding graphic
character within the US-ASCII coded
character set, if the use of the
corresponding character is unsafe, or
if the corresponding character is
reserved for some other interpretation
within the particular URL scheme.
No corresponding graphic US-ASCII:
URLs are written only with the graphic printable characters of the
US-ASCII coded character set. The
octets 80-FF hexadecimal are not used
in US-ASCII, and the octets 00-1F and
7F hexadecimal represent control
characters; these must be encoded.