ansaurus

Question

How do I achieve something like MySQL's latin1_general_ci collation in PHP?

Answer 1

+1 A:

Maybe this:

setlocale(LC_COLLATE, 'fr_FR.Latin1', 'fr.Latin1', 'fr_FR.Latin-1', 'fr.Latin-1');

strcmp() and strcasecmp() are not localized.

chaos 2009-06-19 01:30:07

Is that French? Won't there be characters in, say, German that won't be accounted for in that collation? Or is FR doing something special?I did find an "Indo-European" locale and I am currently testing whether it produces the desired result and accounts for the special characters that I'm after.

Jonathan Collins 2009-06-19 01:39:13

It is French, but I'm trying to use the .Latin1 / .Latin-1 modifier to force that charset. What it takes for that to actually be accepted is the mysterious part.

chaos 2009-06-19 01:42:12

I just tried this and oddly enough it worked. Setting a locale other than the default 'C' enables strcoll() to sort all of the accented characters, even ones that aren't in that particular language. For example, setting fr_FR makes strcoll() aware of the german ß character. Odd! Thanks for your help.

Jonathan Collins 2009-06-19 18:19:34

Answer 2

A:

You can also try the iconv functions to help normalize the strings. That'll handle the accented e to normal e situations. See this related question about sorting utf8 strings, too.

Richard Levasseur 2009-06-19 02:46:26

How exactly can I use iconv? I tried this: iconv('ISO-8859-1', 'ASCII//TRANSLIT', 'Déjérine-Klumpke')but it turned the accented e characters into question marks.

Jonathan Collins 2009-06-19 17:34:28

I figured that out. For some reason to do that transliteration, you need to set a locale other than the default 'C' locale.

Jonathan Collins 2009-06-19 18:15:07

Note that it still isn't able to transliterate characters that aren't in that locale. For example, I tried en_US and it still turned the accented e above into a question mark.I believe the correct solution is still to set a locale other than 'C" and then use strcoll(), as it is seemingly able to collate all of the special characters regardless of the chosen locale.

Jonathan Collins 2009-06-19 18:18:33

have you tried converting the strings to utf and setting the locale to utf8? In python, i managed to do what you want using http://docs.python.org/library/unicodedata.html. I -thought- i had seen a php library to do the normalization/decomposition, but I can't find it now.

Richard Levasseur 2009-06-20 04:51:50

ansaurus

tags:

views:

answers:

How do I achieve something like MySQL's latin1_general_ci collation in PHP?

related questions