tags:

views:

1079

answers:

5

I want to compare C++ class/structure objects. In C, most of the time, one knows exact size of the struct by adding up the sizes of individual fields (assuming that compiler does not add padding). Hence one can use memcmp() function on two object to compare them very fast. I am not sure if the same works for C++. This is because a class also has function definitions and maybe some other hidden things (some RTTI info perhaps? A virtual function table even?)

A quick program with a simple structure containing int and char members and a function showed that size of the structure was sizeof(int)+sizeof(char).

I have a one big struct class with simple int, char etc data types (but a large number of them). I want to compare objects from time to time. I cannot overload the == operator as that will make them compare each field by field. In C, I can compare in one go using memcmp(). Any suggestions for C++? Can I use memcmp() directly? I dont want memcmp() to fail because some other value like virtual function pointer table is different (but all the fields are actually equal) (I'm using g++)

+2  A: 

If your class or struct has nothing virtual, "adding up the sizes of individual fields (and assuming that compiler does not add padding)" is about as correct in C++ as in C (i.e. not entirely, because padding is usually added;-).

Alex Martelli
+9  A: 

It is possible to take sizeof() a struct or class.

edit: since I provided the answer above, you have changed your question from "How can I manually determine the size of C++ structures and classes?" to a more general one about comparing two classes.

The short answer is that you do want to overload the == operator. The belief that it will compare each field by field one at a time is incorrect; you may overload operator == to use any algorithm you like, including a memcmp.

memcmp() on the memory from the first field offset to the last should work fine. A memcmp() on the entire footprint of the class may fail if you are comparing a class of type A to another class B which inherits from A, as the vtable pointers may be different.

Crashworks
Is that guaranteed to work?For eg If I have a class { int a; long b; char c; };Can I write following operator overload func(B) { int byte_cmp = return memcpy(}
Methos
Crashworks
No; memcmp() is not guaranteed to work. It is not guaranteed to work on a single machine, even for POD structs; it certainly isn't guaranteed to work portably across machines.
Jonathan Leffler
That's quite a surprise to me; we've had it working in our networking code for years. But to be fair we've no padding and we zerofill the space under the classes before constructing them.
Crashworks
I just tried your suggestion with a small struct. There is no default == operator. One has to write it.
Methos
So then what is the efficient way to compare two objects?I am writing some performance intensive code.
Methos
If you feel that comparing integral struct fields is your performance bottleneck, assume that you're wrong until you've proven it to yourself with a profiler. If that shows *empirically* that comparing structures truly is a major hotspot, then the best thing to do is make sure that your structures at 16-byte aligned and use SIMD ops (eg, SSE/VMX) to compare them in large swathes. If your structures are smaller than 128 bytes, then doing it field by field will almost surely be faster than anything else you could try.
Crashworks
+3  A: 

P.O.Ds are safe to compare with memcmp.

Padding is a problem for doing a memcmp on class objects, since it may be filled with garbage (which will be different from object to object). In C you usually don't have that problem because it is generally okay to memset the whole struct to 0 before you do any assignment, and that is VERY bad in C++, because you could overwrite the vtable.

I don't believe there is anything in the language specification that says how vtables are implemented, though they are usually a hidded data member. The vtable should be the same for members of the same class (but will of course be different for parent/child classes). When you get into multiple or virtual inheritance the implementation may even be more varied from compiler to compiler.

Dolphin
PODs are not safe to compare with memcmp. The values of padding bytes are not defined. For that matter, ints are in principle not safe to compare with memcmp, because not all bits in the storage representation necessarily participate in the value representation. PODs can be copied with memcpy, but that's not the same thing as being comparable with memcmp.
Steve Jessop
+6  A: 

Be wary on numerous counts...

  1. The values in any padding is indeterminate and hence not comparable.
  2. If your machine is little-endian, comparing integer fields will produce one answer; if your machine is big-endian, it will produce another answer.
  3. Most people regard -1 as smaller than 0, but memcmp() will do byte-wise unsigned comparison, and will therefore treat -1 as bigger than 0.
  4. Any pointers are inherently not comparable relevantly by memcmp().
  5. You cannot compare float or double using memcmp().

On the whole, you are seeking a non-sensible optimization.

Jonathan Leffler
+1 for this: "you are seeking a non-sensible optimization". But FYI you *can* compare float and double using memcmp; equal numbers are equal bit patterns.
Crashworks
+1 useful comparisons of floats is better left to the fpu
TokenMacGuy
@Crashworks: Most fp implementations, IEEE754 included, have a separate concept of +0.0 and -0.0. Most humans will consider these values equal, and in many cases, the runtime system will take pains to compare these as equal. however, they have different bit patterns.
TokenMacGuy
I guess there is the +/-0.0 issue, but it's worth remembering that there can be a huge (30+ cycle) imposed latency between floating-point comparison operations and dependent branches. In my line of work that's actually a critical issue, so I guess I've grown too much a habit of avoiding fcmps.
Crashworks
@Crashworks: if your compiler generates floating point code for equality comparisons, it's rather dumb. As you said, equal numbers are equal bit patterns, and the compiler should handle that. If not, get a better one :-).
Novelocrat
Hmmm; yes, I suppose you can get away with quite a lot for simple equality -- though the indeterminate padding is still a problem. If you seek to order the data, then memcmp() is insufficient.
Jonathan Leffler
Interestingly enough you can actually compare positively floating point numbers as if they were integers and get correct ordering (they are said to be lexicographically ordered -- http://tinyurl.com/o8zul) but this more of a curiosity than an advisable technique.
Crashworks
A: 

To the best of my knowledge, you can prevent padding in structures by using #pragma pack (gcc, vc++)

#pragma pack(push, 1)
struct Example
{
   int a;
   char b;
   short c;
};
#pragma pack(pop)

Printing sizeof(Example) shows it is 7 bytes. Without the #pragma pack, the size is 8 bytes.

Andrew Garrison
7 bytes, and on some architectures accessing c results in undefined behaviour...
Steve Jessop