views:

1071

answers:

1

I'm buiding an API that allows me to fetch strings in various encodings, including utf8, utf16, utf32 and wchar_t (that may be utf32 or utf16 according to OS).

  1. New C++ standard had introduced new types char16_t and char32_t that do not have this sizeof ambiguity and should be used in future, so I would like to support them as well, but the question is, would they interfere with normal uint16_t, uint32_t, wchar_t types not allowing overload because they may refer to same type?

    class some_class {
    public:
        void set(std::string); // utf8 string
        void set(std::wstring); // wchar string utf16 or utf32 according
                                 // to sizeof(wchar_t)
        void set(std::basic_string<uint16_t>)
                             // wchar independent utf16 string
        void set(std::basic_string<uint32_t>);
                             // wchar independent utf32 string
    
    
    #ifdef HAVE_NEW_UNICODE_CHARRECTERS
        void set(std::basic_string<char16_t>)
                             // new standard utf16 string
        void set(std::basic_string<char32_t>);
                             // new standard utf32 string
    #endif
    };
    

    So I can just write:

    foo.set(U"Some utf32 String");
    foo.set(u"Some utf16 string");
    
  2. What are the typedef of std::basic_string<char16_t> and std::basic_string<char32_t> as there is today:

    typedef basic_string<wchar_t> wstring.
    

    I can't find any reference.

    Edit: according to headers of gcc-4.4, that introduced these new types:

    typedef basic_string<char16_t> u16string;
    typedef basic_string<char32_t> u32string;
    

    I just want to make sure that this is actual standard requirement and not gcc-ism.

+17  A: 

1) char16_t and char32_t will be distinct new types, so overloading on them will be possible.

Quote from ISO/IEC JTC1 SC22 WG21 N2018:

Define char16_t to be a typedef to a distinct new type, with the name _Char16_t that has the same size and representation as uint_least16_t. Likewise, define char32_t to be a typedef to a distinct new type, with the name _Char32_t that has the same size and representation as uint_least32_t.

Further explanation (from a devx.com article "Prepare Yourself for the Unicode Revolution"):

You're probably wondering why the _Char16_t and _Char32_t types and keywords are needed in the first place when the typedefs uint_least16_t and uint_least32_t are already available. The main problem that the new types solve is overloading. It's now possible to overload functions that take _Char16_t and _Char32_t arguments, and create specializations such as std::basic_string<_Char16_t> that are distinct from std::basic_string <wchar_t>.

2) u16string and u32string are indeed part of C++0x and not just GCC'isms, as they are mentioned in various standard draft papers. They will be included in the new <string> header. Quote from the same article:

The Standard Library will also provide _Char16_t and _Char32_t typedefs, in analogy to the typedefs wstring, wcout, etc., for the following standard classes:

filebuf, streambuf, streampos, streamoff, ios, istream, ostream, fstream, ifstream, ofstream, stringstream, istringstream, ostringstream, string

Alex Jenter
Thanks a lot, that was really helpful!
Artyom
According to the standard draft, `char16/32_t` are keywords, not typedefs. Who is right?
Philipp