tags:

views:

46

answers:

1

I would like to include the URI http://beispiel.de/schnäppchen into a link in a XHTML document, which is encoded in UTF-8.

Should I percent-encode the URL and write

<a href="http://beispiel.de/schn%C3%A4ppchen"&gt;foobar&lt;/a&gt;

? "ä" is a legal character in UTF-8 and therefore should be legal in XML/XHTML, no?

+2  A: 

Legal in (X)HTML, but not legal in an rfc2396 URL.

Note that the characters are converted using URL %-encoding, and not as SGML entities (with an &)

David Dorward
That also prevents problems, if the user agent auto-converts unquoted umlauts in URLs to ISO 8859-1 before the request.
Boldewyn
To be clear, you can't have anything other than a limited subset of ASCII in URLs. If you want to construct a URL from a string with non-ASCII characters, common practice is to encode the Unicode string in UTF-8, then %-encode the bytes that aren't in the allowable ASCII range.
MtnViewMark