views:

46

answers:

1

I need to compare strings and match names to one another even if they are not spelled the same way. For example DÉSIRÉ-Smith should match Desireesmith as well as Desiree or Desi'ree Smith

So i had the following approch which worked perfectly in the command line using PHP-CLI:

    <?
    class Alike {
      static function convertAlike($string) {
        // in case the first and last name or two first names are mixed up
        $parts = preg_split('/[\s\-\.\_]/', $string, -1, PREG_SPLIT_NO_EMPTY);
        sort($parts);
        $string = implode($parts);

        $string = iconv('UTF-8', 'ASCII//TRANSLIT', $string); // transliterate
        $string = strtolower($string); // lowercase
        $string = preg_replace('/[^a-z]/','',$string); // remove everything but a-z
        $string = preg_replace('{(.)\1+}','$1',$string); // remove duplicate chars
        return $string;
      }
      static function compareAlike($string1,$string2) {
        return (strcmp(Alike::convertAlike($string1),Alike::convertAlike($string2)) === 0) ? true : false;
      }
    }
    echo Alike::convertAlike("DÉSIRÉ-Smith").PHP_EOL; // desiresmith
    echo Alike::convertAlike("Desireesmith").PHP_EOL; // desiresmith
    echo Alike::convertAlike("Desi'ree Smith").PHP_EOL; // desiresmith
    echo Alike::convertAlike("René Röyßeå likes special characters ½ € in ©").PHP_EOL; // reneroysealikespecialcharacterseurinc

    var_dump(Alike::compareAlike("DÉSIRÉ-Smith","Desireesmith")); // true
    var_dump(Alike::compareAlike("Desireesmith","Desi'ree Smith")); // true
    var_dump(Alike::compareAlike("summer","winter")); // false
    ?>

However in my website running Server version: Apache/2.2.14 (Ubuntu) running PHP Version 5.3.2-1ubuntu4.2 as module I always get just question signs. The funny thing is that the error must occour in this line

$string = iconv('UTF-8', 'ASCII//TRANSLIT', $string); // transliterate

because afterwards i can see every character that has not been transliterated, but those that should have been replaced by ascii chars become question signs.

i tried every possible combination of input/output string encoding and iconv internal, input and output encoding settings as well as locale settings. i even did chmod -R 777 /usr/lib/gconv and moved the to my working dir.

however i saw this bug reported ont he mailing list: http://bugs.php.net/bug.php?id=44096

[2010-06-07 21:22 UTC] icovt at yahoo dot com
mod_php iconv() is not working properly if your apache is chrooted and you do not 
have the content of /usr/lib/gconv/ folder into your relative chroot path (i.e. 
/your/chroot/path/usr/lib/gconv/). 
You can simply do: 
cp /usr/lib/gconv/* /your/chroot/path/usr/lib/gconv/
... and re-try.

This was a fix for me, hope this could save time for somebody else.

P.S. Btw, initially iconv() called from command line (using php cli) was OK.

i tried that my www-data user is at home in /var/www/ and i ended up with the folder /var/www/usr/lib/gconv/ as well as /var/www/myproject/usr/lib/gconv/

FYI: i had encoding detection and transcoding functinos to ensure the correct encodings to be passed, but removed them for the sake of clarity as they are not needed anway if you input utf8 strings everything should be fine...

any ideas?

A: 

figured out that the locale wasnt set up correctly and my attempts to set it failed as they locales available on the system were actually named different then the manpage examples (according to their encoding!) a simple locale -a revealed that ;O)

setlocale(LC_ALL, "en_US.utf8");

this actually did the job!

well now this function works perfectly.

well now ita clear why it worked from the console as well, because the locale was imported from the current users shell settings ;) it actually just needs any locale set up. doesnt really matter which one as we convert to ascii where everybody is equal, only some are more equal than others :)

Joe Hopfgartner