Having a variable length encoding is indirectly forbidden in the standard.
So I have several questions:
How is the following part of the standard handled?
17.3.2.1.3.3 Wide-character sequences
A wide-character sequence is an array object (8.3.4) A that can be declared as T A[N], where T is type wchar_t (3.9.1), optionally qualified by any combination of const or volatile. The initial elements of the array have defined contents up to and including an element determined by some predicate. A character sequence can be designated by a pointer value S that designates its first element.
The length of an NTWCS is the number of elements that precede the terminating null wide character. An empty NTWCS has a length of zero.
Questions:
basic_string<wchar_t>
- How is
operator[]
implemented and what does it return?- standard:
If pos < size(), returns data()[pos]. Otherwise, if pos == size(), the const version returns charT(). Otherwise, the behavior is undefined.
- standard:
- Does
size()
return the number of elements or the length of the string?- standard:
Returns: a count of the number of char-like objects currently in the string.
- standard:
- How does
resize()
work?- unrelated to standard, just what does it do
- How are the position in
insert()
,erase()
and others handled?
cwctype
- Pretty much everything in here. How is the variable encoding handled?
cwchar
getwchar()
obviously can't return a whole platform-character, so how does this work?
Plus all the rest of the character function (the theme is the same).
Edit: I will be opening a bounty to get some confirmation. I want to get some clear answers or at least a clearer distribution of votes.
Edit: This is starting to get pointless. This is full of totally conflicting answers. Some of you talk about external encodings (I don't care about those, UTF-8 encoded will still be stored as UTF-16 once read into the string, the same for output), the rest simply contradicts each other. :-/