views:

30

answers:

1

Hi,

I'm having real trouble understanding the specification and guidelines on how to properly escape and encode a URL for submission in a sitemap.

In the sitemap.org (entity escaping) examples, they have an example URL:

http://www.example.com/ümlat.php&q=name

Which when UTF-8 encoded ends up as (according to them):

http://www.example.com/%C3%BCmlat.php&q=name

However, when I try this (rawurlencode) on PHP I end up with:

http%3A%2F%2Fwww.example.com%2F%C3%BCmlat.php%26q%3Dname

I've sort of beaten this by using this function found on PHP.net

$entities = array('%21', '%2A', '%27', '%28', '%29', '%3B', '%3A', '%40', 
    '%26', '%3D', '%2B', '%24', '%2C', '%2F', '%3F', '%23', '%5B', '%5D');

$replacements = array('!', '*', "'", "(", ")", ";", ":", "@", "&", "=", "+",
    "$", ",", "/", "?", "#", "[", "]");

$string = str_replace($entities, $replacements, rawurlencode($string));

but according to someone I spoke to (Kohana BDFM), this interpretation is wrong. Honestly, I'm so confused I don't even know what's right.

What's the correct way to encode a URL for use in the sitemap?

Relevant RFC 3986

+1  A: 
Artefacto
Perfect, thank you for the detailed explanation.
The Pixel Developer