views:

570

answers:

3

Related questions:

  1. http://stackoverflow.com/questions/2653739/how-to-replace-characters-in-a-java-string
  2. http://stackoverflow.com/questions/2393887/how-to-replace-special-characters-with-their-equivalent-such-as-a-for-a

As in the questions above, I'm looking for a reliable, robust way to reduce any unicode character to near-equivalent ASCII using PHP. I really want to avoid rolling my own look up table.

For example (stolen from 1st referenced question): Gračišće becomes Gracisce

A: 

My solution is to create two strings - first with not wanted letters and second with letters that will replace firsts.

$from = 'čšć';
$to   = 'csc';
$text = 'Gračišće';

$result = str_replace(str_split($from), str_split($to), $text);
hsz
"I really want to avoid rolling my own look up table."
Dolph
+1  A: 

Try this:

function normal_chars($string)
{
    $string = htmlentities($string, ENT_QUOTES, 'UTF-8');
    $string = preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', $string);
    $string = preg_replace(array('~[^0-9a-z]~i', '~-+~'), ' ', $string);
    return trim($string);
}

Examples:

echo normal_chars('Álix----_Ãxel!?!?'); // Alix Axel
echo normal_chars('áéíóúÁÉÍÓÚ'); // aeiouAEIOU
echo normal_chars('üÿÄËÏÖÜŸåÅ'); // uyAEIOUYaA

Based on the selected answer in this thread: http://stackoverflow.com/questions/2103797/url-friendly-username-in-php

John Conde
+1, but this only works for a subset of cases. For example, "Škoda" becomes "Scaron koda".
Dolph
+7  A: 

The iconv module can do this, more specifically, the iconv() function:

$str = iconv('Windows-1252', 'ASCII//TRANSLIT//IGNORE', "Gracišce");
echo $str;
//outputs "Gracisce"

The main hassle with iconv is that you just have to watch your encodings, but it's definitely the right tool for the job (I used 'Windows-1252' for the example due to limitations of the text editor I was working with ;) The feature of iconv that you definitely want to use is the //TRANSLIT flag, which tells iconv to transliterate any characters that don't have an ASCII match into the closest approximation.

zombat
Transliteration is now my word of the day.
Dolph