views:

24

answers:

1

Hi, im doing a site to check, register, etc of domains, i have to make it IDN compliant. Right now i have something like this:

echo $domain;       
$domain = idn_to_ascii($domain);
echo $domain;
$domain = idn_to_utf8($domain);
echo $domain;

and im getting this:

testing123ásd123 xn--testing123sd123-wjb testing123ĂĄsd123

as you can see the decoded string isnt the same as the original i also tried using a class by http://phlymail.com/en/downloads/idna/download/ to do it and im getting the same results

i have tried using:

$charset="UTF-8";
echo $domain;       
$domain = idn_to_ascii($domain, $charset);
echo $domain;
$domain = idn_to_utf8($domain);
echo $domain;

and i got exactly the same (except that the encoded string is slightly different)

any ideas?

EDIT: Problem solved! with this http://stackoverflow.com/questions/3132430/problem-in-converting-string-to-puny-code-in-php-using-phlylabss-punycode-stri the original string was in iso-8859-2 and the decoded in UTF-8, now i need to find how to make it iso-8859-2 again but google can help me with that. Any mods? what should i do with the question? close it, erase it? leave it this way?

A: 

As you already point out, ĂĄ appears to be the UTF8 representation of the á character as displayed in a non-UTF8 document.

You can use iconv() to convert between charsets. However, be aware that charsets that are not Unicode cannot represent the full set of international characters so must convert missing chars to HTML entities. E.g.:

<?php

$domain = idn_to_utf8($domain);
echo htmlentities($domain, ENT_COMPAT, 'UTF-8');

?>

In any case, it'd probably be easier to just use UTF-8 for the whole project.

Álvaro G. Vicario