ansaurus

Question

PHP Transliteration

Answer 1

A:

The problem with your query is that it is a very hard thing to do. Not all glyphs in most languages have a-z equivalents, all glyphs have phonetic equivalents (but these are words not letters), if you are just dealing with Latin based languages then things are a little easier but you still have issues with things like I-mutation.

Your best solution word be to come up with a crude list of phonetic sounds -> a-z equivalents, it won't be perfect but without any more information on you exact requirements it is hard to develop a solution.

Jamie Lewis 2009-08-16 15:34:53

I am mosting dealing with European languages, a rough solution would be fine, I once found a big list in the source of another script, but have completely lost it.

esryl 2009-08-16 15:36:01

Answer 2

+1 A:

The strtr manual page lists a few possibilities in the comments, such as

function normalize ($string) {
    $table = array(
        'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
        'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
        'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
        'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
        'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
        'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
        'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
        'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r',
    );

    return strtr($string, $table);
}

Sebastian P. 2009-08-16 15:36:01

Thank you, that is a good start, still missing characters I need like ü. I think I need to go through each alphabet one at a time.

esryl 2009-08-16 15:40:25

The problem with the above solution is that they are not equivalent æ is more equivalent to the phonetic 'ay' or the English 'ae'. It really depending on the original posters needs. If he wants an English phonetic equivalent or a 'vanity' translation

Jamie Lewis 2009-08-16 15:42:55

hi jamie, yeah i noticed the ae not being done correctly in that. so i am currently compiling my own list on a per language basis, at least with my new found mb understanding i can do simple replaces. i was hoping someone somewhere would have this monster list of letters and equivalents already completed.

esryl 2009-08-16 15:49:16

Take a look at my answer, it solves this problem quite nicely 'æ' comes up as 'ae'.

Alix Axel 2009-08-17 17:10:17

For turkish 'ü'=>'u', 'Ü'=>'U' ,'ğ'=>'g', 'Ğ'=>'G', 'ş'=>'s', 'Ş'=>'S'

nerkn 2010-10-27 11:19:22

Some additions : 'ı'=>'i', 'İ'=>'I'

nerkn 2010-10-27 11:30:11

Answer 3

+5 A:

You can use iconv, which has a special transliteration encoding.

When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character.

-- http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html

See here for a complete example that matches your use case.

troelskn 2009-08-16 16:05:50

i had just stumbled across iconv as my research continued, thank you very much for linking me to the complete example. thanks.

esryl 2009-08-16 16:15:27

Answer 4

A:

If you don't have access to iconv and if you don't want to use long lookup tables like the one Sebastian P. suggested you can take advantage of the HTML entity representation for each character like this:

$string = preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));

As far as I'm aware of it doesn't work for Chinese, Japanese and other exotic charsets but works just fine for all the other languages.

Alix Axel 2009-08-16 22:24:19

Answer 5

+2 A:

If you are using iconv then make sure your locale is set correctly before you try the transliteration, otherwise some characters will not be correctly transliterated

setlocale(LC_CTYPE, 'en_US.UTF8');

Shane O'Grady 2009-08-17 10:27:37

ansaurus

tags:

views:

answers:

PHP Transliteration

related questions