ansaurus

Question

What are the diffrences between utf8_general_ci and utf8_unicode_ci?

Answer 1

+7 A:

utf8_general_ci is a very simple collation. What it does - it just - removes all accents - then converts to upper case

utf8_unicode_ci supports so called expansions and ligatures, for example: German letter ß (U+00DF LETTER SHARP S) is sorted near "ss" Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".

utf8_general_ci does not support expansions/ligatures, it sorts all these letters as single characters, and sometimes in a wrong order.

utf8_unicode_ci is generally more accurate for all scripts. For example, on Cyrillic block: utf8_unicode_ci is fine for all these languages: Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian. While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic. Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian are sorted not well.

The disadvantage of utf8_unicode_ci is that it is a little bit slower than utf8_general_ci.

So when you need better sorting order - use utf8_unicode_ci, and when you utterly interested in performance - use utf8_general_ci.

Timotei Dolean 2009-06-24 04:53:23

Props for the list of languages

wizard 2009-06-24 05:15:33

Answer 2

+5 A:

From Unicode Character Sets in the MySQL documentation:

For any Unicode character set, operations performed using the _general_ci collation are faster than those for the _unicode_ci collation. For example, comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci. The reason for this is that utf8_unicode_ci supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages “ß” is equal to “ss”. utf8_unicode_ci also supports contractions and ignorable characters. utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

Gumbo 2009-06-24 04:55:51

ansaurus

tags:

views:

answers:

What are the diffrences between utf8_general_ci and utf8_unicode_ci?

related questions