I am writing a string compare function to sort medical terms that often contain special accented characters from many different European languages, and I need to somehow achieve a collation similar to MySQL's latin1_general_ci
.
First, I'm doing some basic munging on the strings to remove spaces, quotes, hyphens, parentheses, etc. The problem comes when I pass the strings on to strcoll()
using the default locale, because it is not smart enough to consider, for example, an accented e as lexicographically equivalent to a normal e.
I'm wary to use a locale like German or French because it probably will not include all of the special characters I need to consider. Is there a locale that will give me something to similar to the latin1_general_ci
collation? Or is there maybe another solution?
My naive solution would be to create a large associative array to map accented letters to their regular letter equivalents, then using this with str_replace()
, but that sounds slow and tedious (and error-prone). I would rather use something built into the language if possible.
Also on that note, does strcmp()
or strcasecmp()
respect the collation of the current locale, or is it just strcoll()
that does this?