I have some URL that contains special characters. For example:
http://www.example.com/bléèàû.html
If you type this URL in a browser, my web server would show the correct page (it can handle special character).
I have looked at the sitemaps specs and it's not clear whether or not sitemaps file can contain special character. From what I understand of the protocol, if the URL is working fine and the server serves the correct page and the XML file is UTF-8 encoded, then it's ok.
For example, this entry is a valid sitemaps entry:
<url>
<loc>http://www.example.com/bléèàû.html</loc>
<changefreq>weekly</changefreq>
</url>
Anyone can confirm this?
[Update] The reason I'm reluctant to encode the special characters is that I don't want to introduce duplicate URLs for the same content. For example
http://www.example.com/bl%C3%A9%C3%A8%C3%A0%C3%BB.html
and
http://www.example.com/bléèàû.html
would serve the same page. I presume Google would catch both URL with its normal indexing and the sitemaps. Unfortunately Google have a tendency to downgrade page rank of sites that have duplicate URLs pointing to the same page.