UTF-8 vs ASCII Text

views:

132

answers:

UTF-8 vs ASCII Text

Why does sql database use UTF-8 Encoding? do they both use 8-bit to store a character?

+1 A:

For "normal" characters, only 8 bits are used. For characters that do not fit in 8 bits more bits can be used. This makes UTF-8 is a variable length encoding.

Wikipedia has a good article on UTF-8.

ASCII only defines 128 character. So only 7 bits. But is normally stored with 8 bits/character. RS232 (old serial communication) can be used with bytes of 7 bits.

GvS 2010-05-04 14:45:14

The amount of space used depends on the character code. For some characters, it'll use up to 4 bytes for a single char. The ones that use that much space aren't in the BMP, though.

cHao 2010-05-04 14:49:49

ASCII can only represent a limited number of characters at one time. It isn't very useful to represent any language that isn't based on a Latin character set. However, UTF-8 which is an encoding standard for UCS-4 (Unicode) can represent almost any language. It does this by chaining multiple bytes together to represent one character (or glyph to be more correct).

Torlack 2010-05-04 14:46:22

Isn't it UTF-16 what you are saying?

Microkernel 2010-05-04 15:06:30

What do you mean? UTF-16 is just another encoding standard. UTF-7,8,16,32 are all encoding standards that encode UCS-4 (this wasn't always the case, and I wouldn't be shocked if there are still some exceptions)

Torlack 2010-05-04 22:17:35

+1 A:

UTF-8 is used to support a large range of characters. In UTF-8, up to 4 bytes can be used to represent a single character.

Joel has written an article on this subject that you may want to refer to

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Bo Tian 2010-05-04 14:53:53

ansaurus

tags:

views:

answers:

UTF-8 vs ASCII Text

related questions