views:

48

answers:

1

Hi, guys!

I'm working with project on exporting data from text files to mysql database Text files contain both latin and cyrillic alphabets. Here is the bug:

select * from cues where data="ГЭС";
+------+------+
| id | data |
+------+------+
| 1872 | АЭС |
| 4671 | ГЭС |
+------+------+

Why I get "АЭС" also? The same result having "ВЭС", "БЭС" in query, but not "ФЭС" (i don't have these values in the table, but in case of "ВЭС", "БЭС" the query returns the same result as for "ГЭС", in case of "ФЭС" it doesnt return anything).

My only opinion is that the problem is with encoding.

A: 

There are two things to consider: Collation and encoding.

Encoding determines how byte streams are interpreted as text characters, that is, how byte sequences map to code points. I prefer to use UTF-8 for everything, but some legacy systems or external components might force you to convert to and from other encodings here and there.

The collation sets the rules for comparison and sorting. Each table has a default collation, but you can also override it per query if you need to. Depending on the collation, a given pair of characters may be considered equal or not equal; for example, a case-insensitive collation will regard 'a' and 'A' as equal, while a case-sensitive one won't.

So to solve your problem, you need set the correct collation for the table, and make sure you're using the correct encodings for the table and the connection.

tdammers