views:

4187

answers:

3

What I want to do is to remove all accents and umlauts from a string, turning "lärm" into "larm" or "andré" into "andre". What I tried to do was to utf8_decode the string and then use strtr on it, but since my source file is saved as UTF-8 file, I can't enter the ISO-8859-15 characters for all umlauts - the editor inserts the UTF-8 characters.

Obviously a solution for this would be to have an include that's an ISO-8859-15 file, but there must be a better way than to have another required include?

echo strtr(utf8_decode($input), 
           'ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ',
           'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');

UPDATE: Maybe I was a bit inaccurate with what I try to do: I do not actually want to remove the umlauts, but to replace them with their closest "one character ASCII" aequivalent.

A: 

Okay, found an obvious solution myself, but it's not the best concerning performance...

echo strtr(utf8_decode($input), 
           utf8_decode('ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ'),
           'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');
BlaM
+7  A: 
iconv("utf-8","ascii//TRANSLIT",$input);

Extended example

Vinko Vrsalovic
I had to add "setlocale(LC_ALL, 'en_US');" (sadly no locals for Germany seem to be available on my machine :( ), but then it works. Great! :)
BlaM
A: 

Keep in mind that the string you produce will not necessarily have the same meaning as the original string, as discussed in this similar question. It's a serviceable approach for cleaning file names, but probably not something you'd want to do if you are planning to display your new string as text.

Dave DuPlantis
Thanks for the hint. However the resulting string will be used as a simplified version fallback for search if "binary search" fails. Even more simplifications will be applied after this one - to allow illiterates to still find what they are looking for :)
BlaM