views:

27

answers:

1
std::stringstream stream_french;
stream_french.imbue(std::locale("")); // French_France.1252
stream_french << 1000;
std::string value_french = stream_french.str();

This code will convert 1000 to string "1 000" but the value of value_french[1] is -96 and not 32, why is that ?

value_french[0] = 49
value_french[1] = -96 
value_french[2] = 48
value_french[3] = 48 
value_french[3] = 48

If I do

stream_french << "1 000";

The value of value_french[1] is 32. The error seems to be related to the signedess of char, but why is it only affecting white spaces when doing conversions ?

+6  A: 

That -96 is the signed equivalent of 160, i.e. 0xA0; if you go and check the Windows 1252 codepage table, you'll see that such character is

A0 = U+00A0 : NO-BREAK SPACE

which is a space that don't allow an automatic line break:

Text-processing software typically assumes that an automatic line break may be inserted anywhere a space character occurs; a non-breaking space prevents this happening (provided the software recognises the character, of course). For example, if the text "100 km" will not quite fit at the end of a line, the software may insert a line break between "100" and "km". To avoid this undesirable behaviour, the editor may choose to use a non-breaking space between "100" and "km". This guarantees that the text "100 km" will not be broken: if it does not fit at the end of a line it is moved in its entirety to the next line.

As with "100 km", also with "1 000" it's clear that it's not desirable to have a line break between the 1 and the three 0, so a non-breaking space is used; quite clever indeed.

To make it definitely clear: with a "normal" space:

1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000

with a non-breaking space:

1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000 1 000

(if you don't see any difference, try to zoom in/out with the font size of the browser)

Matteo Italia
Thanks for the answer. I wouldn’t go as far as calling it clever, since it’ll output a non-ASCII char. Also I've converted the "1252" string to utf16 using MultiByteToWideChar and the NO-BREAK SPACE (U+00A0) gets converted to a vanilla SPACE (U+0020).
anno
"I wouldn’t go as far as calling it clever, since it’ll output a non-ASCII char." What's wrong with that? Your locale explicitly uses the Windows 1252 CP, so it's fine to use any character that's in it. Following your reasoning it shouldn't even output some accented letters since they aren't in the ASCII table (by the way, I think that the use of that space depends from the windows regional settings, the C++ locales are just a wrapper around them). The MultiByteToWideChar thing seems strange to me, could you post the code you use?
Matteo Italia
My mistake, conversion is fine.
anno