views:

234

answers:

2

CharsetDecoder reads:

There are two general types of decoding errors. If the input byte sequence is not legal for this charset then the input is considered malformed. If the input byte sequence is legal but cannot be mapped to a valid Unicode character then an unmappable character has been encountered.

I understand the concept of malformed characters, but what does an unmappable character mean? I thought that Unicode contains all possible characters. How then could a legal byte sequence not be mappable to a Unicode character?

+2  A: 

While Unicode can represent a great number of characters for a great number of languages, it is certainly not exhaustive. In other words, there are character sets with characters for which there is no mapping into Unicode.

Brandon E Taylor
A: 

Just a guess...

I expect that such a value would exist in one of the empty blocks that have not yet been filled for the implementation. The error probably anticipates values that will be legal characters in the future, but don't exist at present. The set of characters encompassed by Unicode is a work in progress that may never be finished (see proposed characters for characters currently under consideration).

McDowell