views:

508

answers:

12
+3  A: 

You will have to pack your structure.

The way to do that changes depending on the compiler you are using.

For visual c++:

#pragma pack(push)
#pragma pack(1)

struct PackedStruct {
    /* members */
};

#pragma pack(pop)

This will tell the compiler to not pad members in the structure and restore the pack parameter to its initial value. Be aware that this will affect performance. If this struicture is used in critical code, you might want to copy the unpacked structure into a packed structure.

Also, resist temptations to use the command line parameter that totally disable padding, this will greatly affect performance.

Coincoin
Thanks for your answer, but packing is not solution in my case...
vnm
+1  A: 

Look into the #pragma pack macro for your compiler. Some compilers use #pragma options align=packed or something similar.

Graeme Perrow
Thanks for your answer, but packing is not solution in my case...
vnm
+7  A: 

Just save each member of the struct one at a time. If you overload << to write a variable to a file, you can have

myfile << mystruct.member1 << mystruct.member2;

Then you could even overload << to take an entire struct, and do that inside the struct's operator<<, so in the end you have:

myfile << mystruct;

Resulting in save code that looks like:

myfile << count;
for (int i = 0; i < count; ++i)
    myFile << data[i];

IMO all that fiddling about with memory addresses and memcpy is too much of a headache when you could do it this way. This general technique is called serialization - hit google for more, it's a well-developed area.

AshleysBrain
Sure, until you have a struct that has twenty fields in it. Also, every time you add a field, you need to update all the code that reads or writes the structure. Just writing `(ptr, sizeof(mystruct))` will always work.
Graeme Perrow
Never mind - I misunderstood the part about overloading the operator, so the reading and writing code is compartmentalized. I will edit your answer so that I can remove my downvote.
Graeme Perrow
Maybe introducing some function for adding one Data instance per call to buffer is good solution (<< overloading is concrete example for this approach). But what if I must to calculate total size of array of Datas (without padding bytes), without actual writing this structures to file ?
vnm
You could look in to writing to a dynamically-resizing memory buffer where you write the file contents, then dump that memory buffer to a file. You'd need something a bit more sophisticated for very large files though.Basically, as a few other people have said, this technique is called serialization. I've updated my answer to reflect this since it's the proper term for it.
AshleysBrain
+1  A: 

As you can see, I calculate actual size of Data struct with the code: int actualLen = sizeof(char) + sizeof(int). Is there any alternative to this ?

No, not in standard C++.

Your compiler might provide a compiler-specific option, though. Packed structs as shown by Graeme and Coincoin might do.

sbi
I dont get it. He is not writing out the whole struct, he can access the members individually and write them anywhere he wishes to.
dirkgently
Yes. But he said he didn't want to have to sum up the sizes of the data members by hand. And there's no portable way to avoid that.
sbi
+1  A: 

IIUC, you are trying to copy the values of the structure members rather than the structure as a whole and store it to disk. Your approach looks good to me. I do not agree with those suggesting #pragma pack -- since they will help you get a packed structure at runtime.

Few notes:

  • sizeof(char) == 1, always, by definition

  • use the offsetof() macro

  • do not try to instantiate a Data object directly from this targetBuff (i.e. via casting) -- this is when you get into alignment issues and trip. Instead, copy the members out as you did while writing the buffer and you should not have issues
dirkgently
Note that `offsetof(Type, Member)` is only defined when `Type` is a POD type in C++. Using `offsetof` on any type that has a destructor or assignment operator is not well-defined.
D.Shawley
+1. The sample uses a POD, so I had assumed ...
dirkgently
+1  A: 

If you don't want to use pragma pack, try to manually re-order the variables, like

struct Data {
  int  imember;
  char cmember;

};
subbul
Your answer is acceptable for sample in my question but not in other cases... What if we will have next struct on 64 bit platform: 'struct Data2 {int64_t a; int32_t b; int8_t c;} ?'
vnm
There is a general principle here. You order your struct fields by size in descending order. This is how us oldsters have done things for decades. Not perfect, but it keeps you out of a lot of trouble.
T.E.D.
A: 

No, there is no way within the language proper to get this information. One way to approach a solution is to define your data classes indirectly, using some feature of the language - it could be as old-fashioned as macros and the preprocessor, or as new-fangled as tuple templates. You need something which lets you iterate over the class members systematically.

Here's a macro based approach:

#undef  Data_MEMBERS  
#define Data_MEMBERS(Data_OP) \  
    Data_OP(c, char) \  
    Data_OP(i, int)  
#undef  Data_CLASS_DEFINITION  
#define Data_CLASS_DEFINITION(name, type) \  
    type name##member;  
struct Data {  
    Data_MEMBERS(Data_CLASS_DEFINITION)  
};  
#define Data_SERIAL_SIZER(name, type) \  
    sizeof(type) +  
#define Data_Serial_Size \  
    (Data_MEMBERS(Data_SERIAL_SIZER) 0)

And so forth.

Hyman Rosen
A: 

If you can rewrite the struct definition, you could try to use field specifiers to get rid of the holes, like so:

struct Data {  
   char cmember : 1;
   int  imember : 4;
};

Sadly, this does not guarantee that it still won't place imember 4 bytes after the start of cmember. But many compilers will get the idea and do it anyway.

Other alternatives:

  1. Reorder your members by size (largest first). This is an old embedded world trick to minimize holes.

  2. Use Ada instead.

The code

type Data is record
    cmember : character;
    imember : integer;
end record;

for Data use record
    cmember at 0 range 0..7;
    imemeber at 1 range 0..31;
end record;

Does exactly what you want.

T.E.D.
+2  A: 

There is not an easy solution to this problem. You can usually create separate structures and tell the compiler to pack them tightly, something like:

/* GNU has attributes */
struct PackedData {
    char cmember;
    int  imember;
} __attribute__((packed));

or:

/* MSVC has headers and #pragmas */
#include <pshpack1.h>
struct PackedData {
    char cmember;
    int  imember;
};
#include <poppack.h>

Then you have to write code that transforms your unpacked structures into packed structures and vice-versa. If you are using C++, you can create template helper functions that are predicated on the structure type and then specialize them:

template <typename T>
std::ostream& encode_to_stream(std::ostream& os, T const& object) {
    return os.write((char const*)&object, sizeof(object));
}

template <typename T>
std::istream& decode_from_stream(std::istream& is, T& object) {
    return is.read((char*)&object, sizeof(object));
}

template<>
std::ostream& encode_to_stream<Data>(std::ostream& os, Data const& object) {
    encode_to_stream<char>(os, object.cmember);
    encode_to_stream<int>(os, object.imember);
    return os;
}
template <>
std::istream& decode_from_stream<Data>(std::istream& is, Data& object) {
    decode_from_stream<char>(is, object.cmember);
    decode_from_stream<int>(is, object.imember);
    return is;
}

The bonus is that the defaults will read and write POD objects including the padding. You can specialize as necessary to optimize your storage. However, you probably want to consider endianess, versioning, and other binary storage issues as well. It might be prudent to simply write an archival class that wraps your storage and provides methods for serialization and deserialization of primitives and then an open ended method that you can specialize as needed:

class Archive {
protected:
    typedef unsigned char byte;
    void writeBytes(byte const* byte_ptr, std::size_t byte_size) {
        m_fstream.write((char const*)byte_ptr, byte_size);
    }

public:
    template <typename T>
    void writePOD(T const& pod) {
        writeBytes((byte const*)&pod, sizeof(pod));
    }

    // Users are required to specialize this to use it.  If it is used
    // for a type that it is not specialized for, a link error will occur.
    template <typename T> void serializeObject(T const& obj);
 };

 template<>
 void Archive::serializeObject<Data>(Data const& obj) {
     writePOD(cmember);
     writePOD(imember);
 }

This is the approach that I have always ended up at after a bunch of perturbations in between. It is nicely extensible without requiring inheritance and gives you the flexibility to change your underlying data storage format as needed. You can even specialize writePOD to do different things for different underlying data types like ensuring that multibyte integers are written in network order or whatnot.

D.Shawley
In my experience you will always run into the file-and-struct-packing-endianness-oh-my conundrum at some point. A serialization format is key, and this example is a good first step. (You should also take a look at more robust systems like [XDR][http://en.wikipedia.org/wiki/External_Data_Representation] or [Protocol Buffers][http://code.google.com/p/protobuf/]).
Ben Stiglitz
Yup. I have been down that road too many times to think _"I won't need full-blown serialization this time"_ or _"this format will only be used by this one utility"_. You are just better off doing careful bytewise serialization for anything that has a binary representation.
D.Shawley
+1  A: 
Loadmaster
+2  A: 

I would say that you are actually looking for serialization.

There are a number of framework for serialization, but I personally prefer Google Protocol Buffers over Boost.Serialization and other approaches.

Protocol Buffers has versioning and binary/human readable output.

If you are concerned about size, you always have the possibility of compressing the data. There are lightning fast compression algorithm like LZW which offer a good ratio speed/compression for example.

Matthieu M.
+1  A: 

You said @Coincoin that can not pack. If you just need size for some reason, here is dirty solution

#define STRUCT_ELEMENTS  char cmember;/* padding bytes */ int  imember; 
typedef struct 
{
    STRUCT_ELEMENTS 
}paddedData;

#pragma pack(push)
#pragma pack(1)

typedef struct 
{
    STRUCT_ELEMENTS 
}packedData;
#pragma pop

now you have size of both;

sizeof(packedData);
sizeof(paddedData);

Only reason that I can think of why you can not pack is linking this to other program. In that case you will need to pack your structure and then unpeck when working whit external program.

ralu
Original solution )
vnm