I have code that manipulates binary files using fstream with the binary flag set and using the unformatted I/O functions read and write. This works correctly on all systems I've ever used (the bits in the file are exactly as expected), but those are basically all U.S. English. I have been wondering about the potential for these bytes to be modified by a codecvt on a different system.
It sounds like the standard says using unformatted I/O behaves the same as putting characters into the streambuf using sputc/sgetc. These will lead to the overflow or underflow functions in the streambuf getting called, and it sounds like these lead to stuff going through some codecvt (e.g., see 27.8.1.4.3 in the c++ standard). For basic_filebuf the creation of this codecvt is specified in 27.8.1.1.5. This makes it look like the results will depend on what basic_filebuf.getloc() returns.
So, my question is, can I assume that a character array written out using ofstream.write on one system can be recovered verbatim using ifstream.read on another system, no matter what locale configuration either person might be using on their system? I would make the following assumptions:
- The program is using the default locale (i.e., the program is not changing the locale settings itself at all).
- The systems both have CHAR_BIT 8, have the same bit order within each byte, store files as octets, etc.
- The stream objects have the binary flag set.
- We don't need to worry about any endianess differences at this stage. If any bytes in the array are to be interpretted as a multi-byte value, endianess conversions will be handled as required at a later stage.
If the default locale isn't guaranteed to pass through this stuff unmodified on some system configuration (I don't know, Arabic or something), then what is the best way to write binary files using C++?