Is the wchar_t
type required for unicode support? If not then what's the point of this multibyte type? Why would you use wchar_t when you could accomplish the same thing with char
?
views:
537answers:
7char
is generally a single byte. (sizeof(char)
must be equal to 1).
wchar_t
was added to the language specifically to suppose multibyte characters.
wchar_t is not required. It's not even guaranteed to have a specific encoding. The point is to provide a data type that represents the wide characters native to your system, similar to char representing native characters. On Windows, for example, you can use wchar_t to access the wide character Win32 API functions.
wchar_t
is absolutely NOT required for Unicode. UTF-8 for example, maintains backward compatibility with ASCII and uses plain 8-bit char
. wchar_t
mostly yields support for so-called multi-byte characters, or basically any character set that's encoded using more than the sizeof(char)
.
Be careful, wchar_t is often 16bits which is not enough to store all unicode characters and is a bad choice ofr data in UTF_8 for instance
No.
Technically, no. Unicode is a standard that defines code points and it does not require a particular encoding.
So, you could use unicode with the UTF-8 encoding and then everything would fit in a one or a short sequence of legacy char
objects and it would even still be null-terminated.
To answer your "then what is the point of wchars?" question...
The problem with UTF-8 is that s[i]
is not necessarily a character any more, it might be just a piece of one, whereas with wider characters you can mostly preserve the abstraction that x[i] is a single character. (Though there are more than 216 code points, actually.)
You absolutely do not need wchar_t
to support Unicode in the software, in fact using wchar_t
makes it even harder because you do not know if "wide string" is UTF-16 or UTF-32 -- it depends on OS: under windows utf-16 all others utf-32.
However, utf-8 allows you to write Unicode enabled software easily(*)
See: http://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmful
(*) Note: under Windows you still have to use wchar_t
because it does not support utf-8 locales so for unicode enabled windows programming you have to use wchar based API.