Why is my query taking twice as long when I change to the field to utf8?

views:

115

answers:

Why is my query taking twice as long when I change to the field to utf8?

I originally had my field set as latin1_swedish_ci, which I changed to utf8_general_ci (both field and table) and then found my query went from ~1.8 seconds to ~3.3. I have an index on the field and have even recreated the index (delete then add). The field is used in an order by clause.

Any ideas if there might be a problem or is this normal?

I'm running MySQL 5.0.

+4 A:

latin1_swedish_ci is a one-octet-per-character encoding system. Once you know the collation (or sorting) order comparing characters and whole strings is relatively trivial.

utf8_general_ci needs between one and four octets per character. Decoding the octet data in this encoding is harder, so it takes longer.

Alnitak 2009-01-21 09:12:39

+2 A:

I myself don't use mysql that often but I might be able to give some insights into where the problem lies.

the latin1_swedish_ci character set is a single octet encoding system, meaning that every character encoded with this system takes up exactly one byte. Contrast this with the utf8_general_ci character set, where each character consists of from one to four octets per character, meaning one to four bytes are necessary to represent each character.

This has the obvious disadvantage that utf8 characters takes up more space, more memory, and most importantly, more cpu time to identify. And the most obvious advantage is that utf8 characters can encode for any unicode character.

Since this question is marked with 'query-optimization', you need to ask yourself if you really need to represent the more 'exotic' characters, or if the ones represented in single-octet systems (such as the plain ASCII-table) are enough for your needs. Since by its nature, utf8 will eat more cpu/memory.

jimka 2009-01-21 09:26:02

How does your query look like ?

Is it possible that you use a filter on that field, and that you specify the data-type of your parameter to be a non-utf8 datatype ? In that case, the DBMS will have to do some casting, which will hinder performance.

Frederik Gheysels 2009-01-21 09:34:38

The field is purely used in the order by and where clause has other tinyint(1) fields in it.

Darryl Hein 2009-01-21 18:31:00

ansaurus

tags:

views:

answers:

Why is my query taking twice as long when I change to the field to utf8?

related questions