ansaurus

Question

Is this C++ structure initialization trick safe?

Answer 1

+2 A:

Precise layout of a class or structure is not guaranteed in C++, which is why you should not make assumptions about the size of it from the outside (that means if you're not a compiler).

Probably it works, until you find a compiler on which it doesn't, or you throw some vtable into the mix.

Florian Bösch 2008-09-21 20:50:52

Answer 2

+10 A:

PREAMBLE:

While my answer is still Ok, I find litb's answer quite superior to mine because:

It teaches me a trick that I did not know (litb's answers usually have this effect, but this is the first time I write it down)
It answers exactly the question (that is, initializing the original struct's part to zero)

So please, consider litb's answer before mine. In fact, I suggest the question's author to consider litb's answer as the right one.

Original answer

Putting a true object (i.e. std::string) etc. inside will break, because the true object will be initialized before the memset, and then, overwritten by zeroes.

Using the initialization list doesn't work for g++ (I'm surprised...). Initialize it instead in the CMyStruct constructor body. It will be C++ friendly:

class CMyStruct : public MY_STRUCT
{
public:
    CMyStruct() { n1 = 0 ; n2 = 0 ; }
};

P.S.: I assumed you did have no control over MY_STRUCT, of course. With control, you would have added the constructor directly inside MY_STRUCT and forgotten about inheritance. Note that you can add non-virtual methods to a C-like struct, and still have it behave as a struct.

EDIT: Added missing parenthesis, after Lou Franco's comment. Thanks!

EDIT 2 : I tried the code on g++, and for some reason, using the initialization list does not work. I corrected the code using the body constructor. The solution is still valid, though.

Please reevaluate my post, as the original code was changed (see changelog for more info).

EDIT 3 : After reading Rob's comment, I guess he has a point worthy of discussion: "Agreed, but this could be an enormous Win32 structure which may change with a new SDK, so a memset is future proof."

I disagree: Knowing Microsoft, it won't change because of their need for perfect backward compatibility. They will create instead an extended MY_STRUCTEx struct with the same initial layout as MY_STRUCT, with additionnal members at the end, and recognizable through a "size" member variable like the struct used for a RegisterWindow, IIRC.

So the only valid point remaining from Rob's comment is the "enormous" struct. In this case, perhaps a memset is more convenient, but you will have to make MY_STRUCT a variable member of CMyStruct instead of inheriting from it.

I see another hack, but I guess this would break because of possible struct alignment problem.

EDIT 4: Please take a look at Frank Krueger's solution. I can't promise it's portable (I guess it is), but it is still interesting from a technical viewpoint because it shows one case where, in C++, the "this" pointer "address" moves from its base class to its inherited class.

paercebal 2008-09-21 20:51:02

need to close parens for n1 -- I agree with this, BTW

Lou Franco 2008-09-21 20:52:27

The initialization list does not work (for the derived CMyStruct) because n1 and n2 are not direct members of it. Therefore they're already constructed by the time the CMyStruct constructor is called, so you can't call their constructor again.

Jesse Beder 2008-09-21 22:34:10

Don't agree with this as an accepted answer. Initialisation is compiler specific. Frank Krueger and Drealmer's answers are better.

OJ 2008-09-21 22:41:30

What do you mean by "Initialisation is compiler specific?". AFAIK, this code is both perfectly safe and pure C++. Please, elaborate your point and I will correct/remove my post.

paercebal 2008-09-21 22:49:55

You should have a constructor in the base class that allows you to initialize n1 and n2. If you have such a constructor, use it in the derived class' initialization list; this way your code will be efficient standard C++. g++ will accept it.If you can't have such base ctor, use assignment.

wilhelmtell 2008-09-21 23:09:25

@wilhelmtel : Of course, but this solution was not viable as, as I said above, "I assumed you did have no control over MY_STRUCT, of course. With control, you would have added the constructor directly inside MY_STRUCT and forgotten about inheritance."... :-) ...

paercebal 2008-09-21 23:33:12

You can just do `CMyStruct():MY_STRUCT() { }` and all those integers will be zero.

Johannes Schaub - litb 2009-08-22 03:09:31

@litb : Not on my compiler. Anyway, I believe I saw something about only the global variables being initialized to zero by default. Member variables not being globals, they have the default of their type, and in the case of integers in C/C++, no initialization at all... ^_^ ...

paercebal 2009-08-29 20:22:08

@paercebal: Well, that would mean a broken compiler, which is hardly an argument against the `()` form. And there'a huge difference between *default initialization* and *no initialization at all* even for basic types. For a struct like the one in the OP the `()` initializer would set everything to zero.

AndreyT 2009-10-26 22:09:00

@AndreyT : Oops... You're so right !!!

paercebal 2009-12-05 23:40:11

Answer 3

+1 A:

If you already have a constructor, why not just initialize it there with n1=0; n2=0; -- that's certainly the more normal way.

Edit: Actually, as paercebal has shown, ctor initialization is even better.

Lou Franco 2008-09-21 20:51:38

Agreed, but this could be an enormous Win32 structure which may change with a new SDK, so a memset is future proof.

Rob 2008-09-21 20:53:07

Lou Franco: I did miss something as a test of g++ showed me the initialization list did not let me initialize the data from MY_STRUCT. I'll do some research. +1.

paercebal 2008-09-21 21:10:50

Rob, you're right there. Still, if you put some true object in CMyStruct, your solution using memset(this) will break. Still, there is an alternative, I guess...

paercebal 2008-09-21 21:12:55

Answer 4

+8 A:

This would make me feel much safer as it should work even if there is a vtable (or the compiler will scream).

memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));

I'm sure your solution will work, but I doubt there are any guarantees to be made when mixing memset and classes.

Frank Krueger 2008-09-21 20:52:48

it will NOT necessarily work. memset will overwrite the vtable pointer...

Benoit 2008-09-21 21:08:21

I wonder: If CMyStruct has a vtable, and MY_STRUCT is a C POD, then they won't have the same address for 'this'. Thus, casting this will move the address to the right point. So I guess Frank Krueger's solution could be right, but I would need confirmation from a guru. +1.

paercebal 2008-09-21 21:49:03

P.S.: I tested it on g++, and it worked (MY_STRUCT's "this" pointer were 4 bytes after CMyStruct's). But still the question remains: Is this a hack, or is this true portable C++. I guess the later.

paercebal 2008-09-21 21:50:46

paercebal - I figured the cast would make it good. I mean, we do have guarantees on the data continuity of structs and their size, so I can't see why this would ever fail.

Frank Krueger 2008-09-21 22:18:31

The code below crashed :(. MSVC 2005struct MY_STRUCT2{ int n1; int n2;virtual void a()=0;};struct MY_STRUCT : MY_STRUCT2{ void a() { printf("%d ", sizeof(*this)); };};{ MY_STRUCT s; memset(static_cast<MY_STRUCT*>( static_cast<MY_STRUCT2*>(

acidzombie24 2008-11-12 16:50:07

@acidzombie24: Of course your code crashes: The base struct MY_STRUCT2 has a virtual table! The current question is about POD base structs, not virtual table-enabled base structs! ... ^_^ ...

paercebal 2009-05-27 21:00:43

The `this` pointer is always equal in a single inheritance scenario, i think. So the cast is a no-op. The `this` pointer changes when multiple inheritance is involved - but i don't see where this helps with avoiding to overwrite the vtable.

Johannes Schaub - litb 2009-08-22 03:26:59

litb: I'm not sure the this pointer is guaranteed to be equal in single inheritance. Regardless, if they are equal, than the vtable must be after MY_STRUCT, in which case the memset is safe. If, as paercebal said, they are off by 4 bytes (because the vtable is at offset 0), then the pointer must move.

jmucchiello 2009-10-26 05:52:24

@acidzombie24: Your experiment is not relevant. The whole point here is that the base class is a C-style struct (read: POD). It shoudn't have any virtual members. Virtual members can appear in `CMyStruct`, but not in `MY_STRUCT`.

AndreyT 2009-10-26 22:18:01

Unlike the answer which is currently tagged as correct, this one is actually portable. +1.

Pavel Minaev 2009-10-26 23:08:51

Sorry, mistaken the answer for the question itself :/

Pavel Minaev 2009-10-26 23:09:50

AndreyT 2009-10-26 23:19:47

Answer 5

+1 A:

My opinion is no. I'm not sure what it gains either.

As your definition of CMyStruct changes and you add/delete members, this can lead to bugs. Easily.

Create a constructor for CMyStruct that takes a MyStruct has a parameter.

CMyStruct::CMyStruct(MyStruct &)

Or something of that sought. You can then initialize a public or private 'MyStruct' member.

kervin 2008-09-21 20:54:00

I agree. This is another viable solution. +1.

paercebal 2008-09-21 21:41:12

Answer 6

+8 A:

Much better than a memset, you can use this little trick instead:

MY_STRUCT foo = { 0 };

This will initialize all members to 0 (or their default value iirc), no need to specifiy a value for each.

Drealmer 2008-09-21 20:56:34

This is a suboptimal solution, as you will have to initialize the struct everywhere in the code, whereas the constructor will be written once and only once.

paercebal 2008-09-21 21:09:34

You can use this technique, but encapsulated in a class - see my answer below.

Eclipse 2008-09-21 21:52:46

Firstly, that's how you do it in C. in C++ this specific little trick translates to `MY_STRUCT foo = {};` (since not all data types can be initialized with literal 0 in C++). Secondly, when we are talking about constructor initializer list, the direct analoue would be just to include `MY_STRUCT()` in the initializer list.

AndreyT 2009-10-26 22:12:37

The problem with that trick is that it pollutes `-Wmissing-field-initializers` in GCC, which is an otherwise useful warning to have.

Tom 2009-10-27 02:30:03

For the record, this technique doesn't seem to be applicable to class fields, and of course it can't reset all members to zero at an arbitrary location since it must be used where the variable is declared.

romkyns 2010-07-16 10:39:17

Answer 7

+4 A:

This is a perfect example of porting a C idiom to C++ (and why it might not always work...)

The problem you will have with using memset is that in C++, a struct and a class are exactly the same thing except that by default, a struct has public visibility and a class has private visibility.

Thus, what if later on, some well meaning programmer changes MY_STRUCT like so:


struct MY_STRUCT
{
    int n1;
    int n2;

   // Provide a default implementation...
   virtual int add() {return n1 + n2;}  
};

By adding that single function, your memset might now cause havoc. There is a detailed discussion in comp.lang.c+

Benoit 2008-09-21 21:06:43

Then I fire said programmer for changing system header files.

Joshua 2009-08-21 16:38:59

It doesn't even matter. The original code is simply U.B. because there's no guarantee whatosever that `MY_STRUCT` subobject is located at the beginning of derived class.

Pavel Minaev 2009-10-26 23:10:41

Answer 8

+1 A:

From an ISO C++ viewpoint, there are two issues:

(1) Is the object a POD? The acronym stands for Plain Old Data, and the standard enumerates what you can't have in a POD (Wikipedia has a good summary). If it's not a POD, you can't memset it.

(2) Are there members for which all-bits-zero is invalid ? On Windows and Unix, the NULL pointer is all bits zero; it need not be. Floating point 0 has all bits zero in IEEE754, which is quite common, and on x86.

Frank Kruegers tip addresses your concerns by restricting the memset to the POD base of the non-POD class.

MSalters 2008-09-21 22:14:08

Answer 9

+1 A:

Try this - overload new.

EDIT: I should add - This is safe because the memory is zeroed before any constructors are called. Big flaw - only works if object is dynamically allocated.

struct MY_STRUCT
{
    int n1;
    int n2;
};

class CMyStruct : public MY_STRUCT
{
public:
    CMyStruct()
    {
        // whatever
    }
    void* new(size_t size)
    {
        // dangerous
        return memset(malloc(size),0,size);
        // better
        if (void *p = malloc(size))
        {
            return (memset(p, 0, size));
        }
        else
        {
            throw bad_alloc();
        }
    }
    void delete(void *p, size_t size)
    {
        free(p);
    }

};

Thanatopsis 2008-09-21 22:25:07

If you're doing `malloc()` followed by `memset(,0,)`, then you probably want `calloc()` in the first place (OS can, and some do, significantly optimize the latter over the former). In any case, this doesn't seem to have any advantages over using base class initializer.

Pavel Minaev 2009-10-26 23:12:31

Answer 10

+3 A:

The examples have "unspecified behaviour".

For a non-POD, the order by which the compiler lays out an object (all bases classes and members) is unspecified (ISO C++ 10/3). Consider the following:

struct A {
  int i;
};

class B : public A {       // 'B' is not a POD
public:
  B ();

private:
  int j;
};

This can be laid out as:

[ int i ][ int j ]

Or as:

[ int j ][ int i ]

Therefore, using memset directly on the address of 'this' is very much unspecified behaviour. One of the answers above, at first glance looks to be safer:

 memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));

I believe, however, that strictly speaking this too results in unspecified behaviour. I cannot find the normative text, however the note in 10/5 says: "A base class subobject may have a layout (3.7) different from the layout of a most derived object of the same type".

As a result, I compiler could perform space optimizations with the different members:

struct A {
  char c1;
};

struct B {
  char c2;
  char c3;
  char c4;
  int i;
};

class C : public A, public B
{
public:
  C () 
  :  c1 (10);
  {
    memset(static_cast<B*>(this), 0, sizeof(B));      
  }
};

Can be laid out as:

[ char c1 ] [ char c2, char c3, char c4, int i ]

On a 32 bit system, due to alighments etc. for 'B', sizeof(B) will most likely be 8 bytes. However, sizeof(C) can also be '8' bytes if the compiler packs the data members. Therefore the call to memset might overwrite the value given to 'c1'.

Richard Corden 2008-09-22 09:14:01

I don't understand why a cast of `this` to `MY_STRUCT*` should be safer. Doesn't it result in the same address when only single inheritance is involved? I don't get this trick

Johannes Schaub - litb 2009-08-22 03:29:05

The cast is necessary since you cannot guarantee that 'B' starts at the beginning address of 'C'. The cast therefore offsets the 'this' pointer so that it points at the start of 'B'. Otherwise, we'd potentially set 'A::C1' to zero first and up to the 3rd byte of 'B::i'.

Richard Corden 2009-08-25 10:30:33

Oh i didn't spot that in your first example, you only wanted to clear out `i`. I agree if that's the case then it seems to me too a cast is needed also in the single inheritance scenario.

Johannes Schaub - litb 2009-10-26 16:51:28

Answer 11

A:

It's a bit of code, but it's reusable; include it once and it should work for any POD. You can pass an instance of this class to any function expecting a MY_STRUCT, or use the GetPointer function to pass it into a function that will modify the structure.

template <typename STR>
class CStructWrapper
{
private:
    STR MyStruct;

public:
    CStructWrapper() { STR temp = {}; MyStruct = temp;}
    CStructWrapper(const STR &myStruct) : MyStruct(myStruct) {}

    operator STR &() { return MyStruct; }
    operator const STR &() const { return MyStruct; }

    STR *GetPointer() { return &MyStruct; }
};

CStructWrapper<MY_STRUCT> myStruct;
CStructWrapper<ANOTHER_STRUCT> anotherStruct;

This way, you don't have to worry about whether NULLs are all 0, or floating point representations. As long as STR is a simple aggregate type, things will work. When STR is not a simple aggregate type, you'll get a compile-time error, so you won't have to worry about accidentally misusing this. Also, if the type contains something more complex, as long as it has a default constructor, you're ok:

struct MY_STRUCT2
{
    int n1;
    std::string s1;
};

CStructWrapper<MY_STRUCT2> myStruct2; // n1 is set to 0, s1 is set to "";

On the downside, it's slower since you're making an extra temporary copy, and the compiler will assign each member to 0 individually, instead of one memset.

Eclipse 2008-09-22 16:23:32

Firtly, this implementation contains a completely unnecessary quirk/bug/error/whatever. If the first member of `STR` does not accept `0` initializer (say, it has enum type), the code will fail to compile. Note, this is C++, not C, and the proper idiom for default aggregate initialization is `STR temp = {}`, not the above `STR temp = { 0 }`.

AndreyT 2009-10-26 22:51:47

Secondly, it is still very hard to justify all these efforts, when C++ already provides the plain and simple immediate analogue for `{}` aggregate syntax usable in constructor initializer lists. It is `()` initializer. We have a thread with more than ten replies, almost all of which are trying to solve a non-existing problem :)

AndreyT 2009-10-26 22:54:18

But where's the fun in that...

Eclipse 2009-10-26 23:03:58

Answer 12

A:

What I do is use aggregate initialization, but only specifying initializers for members I care about, e.g:

STARTUPINFO si = {
    sizeof si,      /*cb*/
    0,              /*lpReserved*/
    0,              /*lpDesktop*/
    "my window"     /*lpTitle*/
};

The remaining members will be initialized to zeros of the appropriate type (as in Drealmer's post). Here, you are trusting Microsoft not to gratuitously break compatibility by adding new structure members in the middle (a reasonable assumption). This solution strikes me as optimal - one statement, no classes, no memset, no assumptions about the internal representation of floating point zero or null pointers.

I think the hacks involving inheritance are horrible style. Public inheritance means IS-A to most readers. Note also that you're inheriting from a class which isn't designed to be a base. As there's no virtual destructor, clients who delete a derived class instance through a pointer to base will invoke undefined behaviour.

fizzer 2008-09-22 19:01:50

It is a hack from an OO design viewpoint, you're right. But from a C++ viewpoint, it is not: A class with only non virtual methods is only a struct with functions attached to it. And without additionnal member variables, there is no need for a virtual destructor anyway...

paercebal 2008-09-25 20:21:48

See 5.3.5 of the Standard.

fizzer 2008-10-05 19:00:14

You're right about that. In most implementation I saw, it resulted in the derived destructor not being called. But again, you're right, and one should not play polymorphism (i.e. delete it through its base pointer) to avoid undefined behaviour.

paercebal 2008-10-27 22:19:57

Public inheritance means "IS-A" only within the paradigm of OOP. C++ is a multi-paradigm language. In general C++ public ineritance has its uses well outside OOP paradigm, like in template metaprogramming, for example. A simple example would be the public inheritance of, say, `std::back_insert_iterator` from `std::iterator`. It does not imply OOP's IS-A in any way.

AndreyT 2009-10-26 22:46:51

Answer 13

+1 A:

If MY_STRUCT is your code, and you are happy using a C++ compiler, you can put the constructor there without wrapping in a class:

struct MY_STRUCT
{
  int n1;
  int n2;
  MY_STRUCT(): n1(0), n2(0) {}
};

I'm not sure about efficiency, but I hate doing tricks when you haven't proved efficiency is needed.

jwhitlock 2009-08-21 16:29:15

slashmais 2009-09-09 06:28:27

@jwhitlock: Adding a constructor to an originally POD struct is a notable qualitative change. Decision to make such a change should be made at the design level. Even if `MY_STRUCT` is within our control, making such a drastic change as robbing `MY_STRUCT` of its POD properties just for some extremely local reason is a design error.

AndreyT 2009-10-26 23:05:47

Answer 14

+25 A:

You can simply value-initialize the base, and all its members will be zero'ed out. This is guaranteed

struct MY_STRUCT
{
    int n1;
    int n2;
};

class CMyStruct : public MY_STRUCT
{
public:
    CMyStruct():MY_STRUCT() { }
};

For this to work, there should be no user declared constructor in the base class, like in your example.

No nasty memset for that. It's not guaranteed that memset works in your code, even though it should work in practice.

Johannes Schaub - litb 2009-08-22 03:15:07

This is the most relevant and correct answer. (I wonder if there's something that can be done on SO in order to shift the balance from the "earliest" answer to the "most correct" one. Currently earlier answers, unfortunately, beat the correct ones most of the time.)

AndreyT 2009-10-26 16:32:24

Thanks mate `:)`

Johannes Schaub - litb 2009-10-26 16:52:13

@litb : Sorry, I misunderstood your original comment to my post, and it took AndreyT's own comment to make it right. As far as I see it, your answer is the right one, and I modified my post to clarify this point. Of course, +1 to your answer. Thanks!... ^_^ ...

paercebal 2009-12-05 23:50:33

Answer 15

A:

I assume the structure is provided to you and cannot be modified. If you can change the structure, then the obvious solution is adding a constructor.

Don't over engineer your code with C++ wrappers when all you want is a simple macro to initialise your structure.

#include <stdio.h>

#define MY_STRUCT(x) MY_STRUCT x = {0}

struct MY_STRUCT
{
    int n1;
    int n2;
};

int main(int argc, char *argv[])
{
    MY_STRUCT(s);

    printf("n1(%d),n2(%d)\n", s.n1, s.n2);

    return 0;
}

grob 2009-10-26 05:44:48

ansaurus

tags:

views:

answers:

Is this C++ structure initialization trick safe?

PREAMBLE:

Original answer

related questions