views:

1283

answers:

2

Hi,

We use varchar(255) for storing "keywords" in mysql. We are facing a problem that mysql ignores all trailing spaces for comparison purposes in "=". It does respect trailing spaces in "like" comparison, but it does not let us store same word with and without trailing spaces in a varchar column if it has a "UNIQUE" index over it.

So, we are considering switching to varbinary. Can anybody suggest what could be the implications when there are multi-byte characters in column values?

+1  A: 

This is what the MySQL manual says about trailing spaces:

Handling of trailing spaces is version-dependent. As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values.

Since your question says that MySQL does not repsect trailing spaces, I assume your version is lower than 5.0.3. Consider using the TEXT type for your column; these preserve trailing spaces. TEXT will handle the string's encoding and decoding for you, so you don't have to worry about multi-byte characters.

TEXT does perform slower than VARBINARY. If actual data shows that performance is unacceptable, you might have to opt for VARBINARY (or a BLOB.) In that case, it's up to you to store the string in a particular encoding, like UTF-8. As long as all your clients use the same encoding, this would work fine for multi-byte characters. Do test your clients with different regional settings :)

Andomar
A: 

Andomar,

We use version 5.0.5. All mysql versions ignore trailing spaces for comparison. From the manual:

All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces. This is true for all MySQL versions, and it makes no difference whether your version trims trailing spaces from VARCHAR values before storing them

Moreover mysql considers texts with/without trailing spaces duplicate in indexes:

For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters will result in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error.

And, we absolutely need an index on keywords. So, I guess we have two options: varbinary or text. We shall evaluate the performance of "text", and multibyte functionality for varbinary.

ashweta