ansaurus

Question

wchar_t vs wint_t

Answer 1

+3 A:

wint_t is capable of storing any valid value of wchar_t. A wint_t is also capable of taking on the result of evaluating the WEOF macro (note that a wchar_t is too narrow to hold the result).

Brandon E Taylor 2009-07-04 04:05:01

Ok, thanks. So, in brief: when is it better to use wchar_t then? Why not always use wint_t?

Dervin Thunk 2009-07-04 19:42:40

Answer 2

+3 A:

UTF-8 is one possible encoding for Unicode. It defines 1, 2 or 3 bytes per character. When you read it through getwc(), it will fetch one to three bytes and compose from them a single 16-bit character, which would fit within a wchar (which is at least 16 bits wide).

But since all of the Unicode values map to 0x0000 to 0xFFFF, there are no values left to return condition or error codes in.

Various error codes include EOF (WEOF), which maps to -1. If you were to put the return value of getwc() in a wchar, there would be no way to distinguish it from a Unicode 0xFFFF character (which, BTW, is reserved anyway, but I digress).

So the answer is to use a wider type, an wint_t (or int), which holds at least 32 bits. That gives the lower 16 bits for the real value, and anything with a bit set outside of that range means something other than a character returning happened.

Why don't we always use wchar then instead of wint? Most string-related functions use wchar because on most platforms it's ½ the size of wint, so strings have a smaller memory footprint.

lavinio 2009-07-04 04:24:28

An UTF-8 character can be 4 bytes long, technical it can even take 5 or 6 bytes, but such compositions are not valid utf8 characters.

quinmars 2009-07-04 07:40:28

Well, true. It can be 4 bytes long if you go into the extra plan characters of 0x10000 and higher, but that gets into surrogates when dealing with UTF-16, and I thought it outside the scope of the question. And while 5 or 6 byte sequences are possible, they can always be expressed in fewer than 5 bytes, and are only generated by poor-quality serializers.

lavinio 2009-07-04 12:48:28

Your answer is mostly correct, but you provide too many (platofrm depenent) details. `wchar_t` is _not_ always 16 bits, I can think of at least 2 OS/compiler combinations where it's 32.

Logan Capaldo 2009-07-04 17:44:02

Thanks. I was referring to the character itself needing 16 bits, but I can see now the ambiguity. Clarified, and also for wint_t.

lavinio 2009-07-04 19:12:14

ansaurus

tags:

views:

answers:

wchar_t vs wint_t

related questions