ansaurus

Question

How to query MySQL for exact length and exact UTF-8 characters

Answer 1

A:

You have to use proper collation.
Dunno for the latvian but here is the example for the german: http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html
to give you an idea

You can try some of the baltic collations

Col. Shrapnel 2010-04-23 22:58:47

Answer 2

A:

What MySQL query would return my exact two words ('tēja','vējš')?

SELECT * FROM words WHERE value LIKE '_ēj_' COLLATE utf8_bin;

The utf8_bin collation is not just diacritical-sensitive, but also case-sensitive. If you want to match only the letter-with-diacritical and you don't care about upper/lower case, you would have to find a utf_..._ci collation that doesn't treat e and ē as the same letter.

I can't immediately see one (there are plenty that don't collate ē at all, which would be okay if you only need case-sensitive matching on the non-diacritical letters). Interesting that the Latvian collation treats macron-letters as the same as plain letters, which you don't want (it knows š is different from s).

Anyway, whatever collation you end up with, you will want to put your tables in that collation rather than manually specifying it in a query, so that comparisons can be properly indexed.

bobince 2010-04-23 23:23:29

Thank you, I did exactly as you said - changed table to: CHARACTER SET utf8 COLLATE utf8_bin. I expect to use also some cyrilic symbols so I`ll stick to UTF-8

oskarae 2010-04-23 23:46:09

ansaurus

tags:

views:

answers:

How to query MySQL for exact length and exact UTF-8 characters

related questions