ansaurus

Question

Answer 1

+6 A:

LENGTH() returns the length of the string measured in bytes. CHAR_LENGTH() returns the length of the string measured in characters.

This is especially relevant for Unicode, in which most characters are encoded in two bytes. Or UTF-8, where the number of bytes varies. For example:

select length(_utf8 '€'), char_length(_utf8 '€')
--> 3, 1

As you can see the Euro sign occupies 3 bytes (it's encoded as 0xE282AC in UTF-8) even though it's only one character.

Andomar 2009-11-14 14:14:29

Only UCS-2 is encoded in two bytes per character. This encoding (or more accurately UTF-16LE) is what Windows misleadingly calls “Unicode”. MySQL doesn't support UTF-16; instead the usual approach for putting Unicode strings in it is to use UTF-8.

bobince 2009-11-14 14:20:05

For example: select length('日本語'), char_length('日本語');

sanmai 2009-11-14 14:22:45

yesh! another example: `length('华语')` vs `char_length('华语')`

o.k.w 2009-11-14 14:26:51

@bobince: Even UCS-2 encodes some characters in more than 2 bytes, for example `0313 combining comma above`. Since a = 61, 0x00610313 displays as a̓, and it takes up 4 bytes.

Andomar 2009-11-14 14:32:19

Actually by Unicode terminology that's still 2 characters, even though like all combining marks it can — if a suitable font is available — be rendered as a single glyph. UTF-16LE can still have a 4-byte character though thanks to the surrogates.

bobince 2009-11-14 22:05:29

ansaurus

tags:

views:

answers:

MySQL - length() vs char_length()

related questions