tags:

views:

262

answers:

1

What's the main difference between length() and char_length()?

I believe it has something to do with binary and non-binary strings. Is there any practical reason to store strings as binary?

mysql> select length('MySQL'), char_length('MySQL');
+-----------------+----------------------+
| length('MySQL') | char_length('MySQL') |
+-----------------+----------------------+
|               5 |                    5 |
+-----------------+----------------------+
1 row in set (0.01 sec)
+6  A: 

LENGTH() returns the length of the string measured in bytes. CHAR_LENGTH() returns the length of the string measured in characters.

This is especially relevant for Unicode, in which most characters are encoded in two bytes. Or UTF-8, where the number of bytes varies. For example:

select length(_utf8 '€'), char_length(_utf8 '€')
--> 3, 1

As you can see the Euro sign occupies 3 bytes (it's encoded as 0xE282AC in UTF-8) even though it's only one character.

Andomar
Only UCS-2 is encoded in two bytes per character. This encoding (or more accurately UTF-16LE) is what Windows misleadingly calls “Unicode”. MySQL doesn't support UTF-16; instead the usual approach for putting Unicode strings in it is to use UTF-8.
bobince
For example: select length('日本語'), char_length('日本語');
sanmai
yesh! another example: `length('华语')` vs `char_length('华语')`
o.k.w
@bobince: Even UCS-2 encodes some characters in more than 2 bytes, for example `0313 combining comma above`. Since a = 61, 0x00610313 displays as a̓, and it takes up 4 bytes.
Andomar
Actually by Unicode terminology that's still 2 characters, even though like all combining marks it can — if a suitable font is available — be rendered as a single glyph. UTF-16LE can still have a 4-byte character though thanks to the surrogates.
bobince