views:

210

answers:

7

I have a C++ class that looks a bit like this:

class BinaryStream : private std::iostream
{
    public:
        explicit BinaryStream(const std::string& file_name);
        bool read();
        bool write();

    private:
        Header m_hdr;
        std::vector<Row> m_rows;        
}

This class reads and writes data in a binary format, to disk. I am not using any platform specific coding - relying instead on the STL. I have succesfully compiled on XP. I am wondering if I can FTP the files written on the XP platform and read them on my Linux machine (once I recompile the binary stream library on Linux).

Summary:

  1. Files created on Xp machine using a cross platform library coompiled for XP.
  2. Compile the same library (used in 1 above) on a Linux machine

Question: Can files created in 1 above, be read on a Linux machine (2) ?

If no, please explain why not, and how I may get around this issue.

A: 

As long as it's plain binary files it should work

dutt
"plain binary" ? - pray tell, what could that possibly mean ?
skyeagle
as opposed to binary to be interpreted as text, when you have to worry about line endings, I would imagine
Steve Gilham
A: 

Because you're using the STL for everything, there's no reason your program shouldn't be able to read the files on a different platform.

Adam Maras
+1  A: 

This depends entirely on the specifics of the binary encoding. One thing that's different about Linux vs. XP is that you're much more likely to find yourself on a big-endian platform, and if your binary encoding is endian specific you'll end up with issues.

You may also end up with issues relating to the end-of-line character. There isn't enough information here about how you're using ::std::iostream to give you a good answer to this question.

I would strongly suggest looking at the protobuf library. It is an excellent library for creating fast cross-platform binary encodings.

Omnifarious
It was 'endianess' that I was worried about (and what prompted the post) - however supisingly, when I read you comment I realized that this would not be an issue since endianess is at the "hardware end of things" - and both machines are PCs with i86 architecture (both 32 bit machines at that)
skyeagle
x86, x86_64, etc. machines are all little endian. Example of a big endian platform are the old PowerPC based macs.
reko_t
Yes, if you are certain that the Linux machine in question is also a little-endian machine then you are safe in that regard. I do not like making that kind of assumption, and I highly recommend you look into a library that's specifically designed to create cross-platform binary encodings.
Omnifarious
protobuf looks kinda cool. I'm reading up on it to see the effort required to modify my code ...
skyeagle
actually facebooks thrift looks better than protobuff ...
skyeagle
Well, thrift is made by an ex-google person who went to Facebook and implemented thrift. :-) I've looked at them both, and I like Protobuf better. I can't remember the reason I like that one better though, but I do remember it was a fairly technical reason. I freely admit some bias though since the main Protobuf guy at Google is a personal friend of mine.
Omnifarious
Ahh, that's why. Thrift focuses way, way too hard on being an RPC protocol. And, IMHO, RPC is terribly evil. Protocols should be data focused, not operation focused, mostly for reasons of reducing coupling and encouraging designs that handle latency well. Also, protobuf encoding is more efficient.
Omnifarious
+1  A: 

Derive from std::basic_streambuf. That's what they are there for. Note, most STL classes are not designed to be derived from. The one I mention is an exception.

dirkgently
+1  A: 

If you want that your code is portable across machines with different endianess, you need to stick to using one endianess in your files. Whenever you read or write files, you do conversions between the host byte order, and the file byte order. It's common to use what you call network byte order when you want to write files that are portable across all machines. Network byte order is defined to be big endian, and there are pre-made functions made to deal with those conversions (although they are very easy to write yourself).

For example, before writing a long to a file, you should convert it to network byte order using htonl(), and when reading from a file you should convert it back to host byte order with ntohl(). On big-endian system htonl() and ntohl() simply return the same number as passed to the function, but on little-endian system it swaps each byte in the variable.

If you don't care about supporting big-endian systems, none of this is an issue though, although it's still good practice.

Another important thing to pay attention to is padding of your structs/classes that you write, if you write them directly to the file (eg. Header and Row). Different compilers on different platforms can use different padding, which means that variables are aligned differently in the memory. This can break things big-time, if the compilers you use on different platform use different padding. So for structs that you intend to write directly to files/other streams, you should always specify padding. You should tell the compiler to pack your structs like this:

#pragma pack(push, 1)
struct Header {
  // This struct uses 1-byte padding
  ...
};
#pragma pack(pop)

Remember that doing this will make using the struct more inefficient when you use it in your application, because access to unaligned memory addresses means more work for the system. This is why it's generally a good idea to have separate types for the packed structs that you write to streams, and a type that you actually use in the application (you just copy the members from one to other).

EDIT. Another way to deal with the issue, of course, is to serialize those structs yourself, which won't require using #pragma (pragmas are compiler-dependent feature, although all major compilers to my knowledge supports the pragma pack).

reko_t
Eeek! - call me a wimp, but this is exactly the sort of coding I'm trying to avoid
skyeagle
I dont like to get *THAT* close to the machine. I am looking at facebook's thrift framework, looks interesting ...
skyeagle
If you're referring to the #pragma thing, then it's a good thing. It's _always_ good idea to serialize the data yourself, as writing structs directly to streams will sooner or later make you bump into compatibility issues (and it's not pretty).
reko_t
And when I said "it's a good thing", I meant that it's good thing you want to avoid it.
reko_t
+1  A: 

Here is an article Endianness that is related to your question. Look for "Endianness in files and byte swap". Briefly if If your Linux machine has the same endianes than it's OK, if not - there migth be problems.

For example when integer 1 is written in file on XP it looks like this: 10 00

But when integer 1 is written in file on machine with the other endianess it will look like this: 00 01

But if you use only one byte characters there must be no problem.

skwllsp
A: 

If you are writing a struct / class directly out to the disc, then don't.

This might not be compatible between different builds on the same compiler, and almost certainly will break when you move to a different platform or compiler. It will definitely break if you change to a different architecture.

It isn't clear from the above code what you're actually writing to the file.

MarkR