views:

93

answers:

1

I am experimenting with filter_input and filter_var and I am currently trying to sanitize URLs with FILTER_SANITIZE_URL. The test program gets input from a GET variable which consists of a URL, (ex. foo.com/bar.php?a=http://www.domain.se). It works fine as long as I don't use swedish domain names. Ex: (foo.com/bar.php?a=http://www.äta.se) gets sanitized to where a = http://www.ta.se which obviously isn't the same.

Is there any simple solutions for this?

A: 

Domains with special characters are technically not transferred with non-ASCII characters (like the ä in your case), they are punycode encoded. The calling program should encode it's URLs accordingly.

See:
http://en.wikipedia.org/wiki/Internationalized_domain_name
http://en.wikipedia.org/wiki/Punycode

Example:
http://www.äta.se is http://www.xn--ta-uia.se

dbemerlin