ansaurus

Question

Answer 1

+6 A:

As the name suggests, the BOM only tells you the byte order, not the encoding. You have to know what the encoding is first, then you can use the BOM to determine whether the least or most significant bytes are first for multibyte sequences.

A fortunate side-effect of the BOM is that you can also sometimes use it to guess the encoding if you don't know it, but that is not what it was designed for and it is no substitute for sending proper encoding information.

Mark Byers 2009-12-18 18:46:43

Answer 2

A:

It is unambiguous. FF FE is for UTF-16LE, and FF FE 00 00 denotes UTF-32LE. There is no reason to think that FF FE 00 00 is possibly UTF-16LE because the UTFs were designed for text, and users shouldn't be using NUL characters in their text. After all, when was the last time you opened a hex editor and inserted a few bytes of 00 into a text document? ^_^

Dustin 2009-12-18 18:51:56

The null character may well be part of a higher-order protocol encoded in the text. Unicode doesn't actually care about what code points are used in text and U+0000 is just as valid as U+0041.

Joey 2009-12-18 18:59:18

ansaurus

tags:

views:

answers:

Unicode BOM for UTF-16LE vs UTF32-LE

related questions