The more I work with C++ locale facets, more I understand --- they are broken.
std::time_get
-- is not symmetric withstd::time_put
(as it in C strftime/strptime) and does not allow easy parsing of times with AM/PM marks.- I descovered recently that simple number formatting may produce illegal UTF-8 under certain locales (like
ru_RU.UTF-8
). std::ctype
is very simplistic assuming that to upper/to lower can be done on per-character base (case conversion may change number of characters and it is context dependent).std::collate
-- does not support collation strength (case sensitive or insensitive).- There is not way to specify timezone different from global timezone in time formatting.
And much more...
- Does anybody knows whether any changes are expected in standard facets in C++0x?
- Is there any way to bring an importance of such changes?
Thanks.
EDIT: Clarifications in case the link is not accessible:
std::numpunct
defines thousands separator as char. So when separator in U+2002 -- different kind of space it can't be reproduced as single char in UTF-8 but as multiple byte sequence.
In C API struct lconv
defines thousands separator as string and does not suffers from this problem. So, when you try to format numbers with separators outside of ASCII with UTF-8 locale, invalid UTF-8 is produced.
To reproduce this bug write 1234 to std:ostream with imbued ru_RU.UTF-8
locale
EDIT2: I must admit that POSIX C localization API works much smoother:
- There is inverse of strftime -- strptime (strftime does same as
std::time_put::put
) - No problems with number formatting because of the point I mentioned above.
However it is still for from being perfecet.
EDIT3: According to the latest notes about C++0x I can see that std::time_get::get
-- similar to strptime
and opposite of std::time_put::put
.