ansaurus

Question

Answer 1

+1 A:

utf8_bin is a binary collation - it is extremely literal and strict in comparing characters.

utf8_general_ci is more lenient and normalizes Umlauts and accents to their "basic" version:

Ä => A

Ü => U etc.

I have never worked with Arabic before so I don't know whether it applies for those accents as well but I would expect so.

You should be able to temporarily use the lenient collation like this:

WHERE field1 like '%[arabic text]%' COLLATE utf8_general_ci;

Pekka 2010-07-07 09:58:27

Answer 2

A:

I finally resolved the issue. Posting it for future reference to help community.

After reading an article, i come to know there are 2 (client & server) encodings used with mysql. My server encoding (table charset/collation) was utf8, but client’s encoding was latin1 and the time of INSERT.

So data stored in DB is somewhat mixed, due to which REPLACE() was not working as expected.

FIX: I've posted it on my blog here ... mysql-issue-with-latin1-multi-byte

Azghanvi 2010-07-07 19:52:24

ansaurus

tags:

views:

answers:

Removing accents chars from arabic text

related questions