views:

159

answers:

5

I had a hard time debugging a crash on production. Just wanted to confirm with folks here about the semantics. We have a class like ...

class Test {
public:
  Test()
  {
    // members initialized ...
    m_str = m_str;
  }
  ~Test() {}
private:
  // other members ...
  std::string m_str;
};

Someone changed the initialization to use ctor initialization-lists which is reasonably correct within our code semantics. The order of initialization and their initial value is correct among other things. So the class looks like ...

class Test {
public:
  Test() 
    : /*other inits ,,, */ m_str(m_str)
  {
  }
  ~Test() {}
private:
  // other members ...
  std::string m_str;
};

But the code suddenly started crashing! I isolated the long list of inits to this piece of code m_str(m_str). I confirmed this via link text.

Does it have to crash? What does the standard say about this? (Is it undefined behavior?)

+2  A: 

m_str is constructed in the initialization list. Therefore, at the time you are assigning it to itself, it is not fully constructed. Hence, undefined behavior.

(What is that self-assignment supposed to do anyway?)

Nate
I donno, someone else's code. I need confirmation if it's indeed undefined behavior. Becoz it definitely does not crash for POD types.
Abhay
why do you need confirmation? It is bad code. Just fix it.
Tim
@Tim: Agreed. But its live code, being hit a million times per day. I need approval from ppl higher up who say 'its been working for ages ...' So i need a convincing reference.
Abhay
I don’t have page and chapter, but you can assume there’s a pointer somewhere inside the uninitialized `string` that is pointing to a random location in memory. And you’re copying that! Bad bad bad.
Nate
+1  A: 

Undefined behavior doesn't have to lead to a crash -- it can do just about anything, from continuing to work as if there was no problem at all, to crashing immediately, to doing something really strange that causes seemingly unrelated problems later. The canonical claim is that it makes "demons fly out of your nose" (aka, "causes nasal demons"). At one time the inventor of the phase had a (pretty cool) web site telling about the nuclear war that started from somebody causing undefined behavior in the "DeathStation 9000".

Edit: The exact wording from the standard is (§:1.3.12):

1.3.12 undefined behavior [defns.undefined]

behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements. Undefined behavior may also be expected when this International Standard omits the description of any explicit definition of behavior. [Note: permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

Jerry Coffin
A: 

This is the same difference as between

std::string str;
str = str;

and

std::string str(str);

The former works (although it's nonsense), the latter doesn't, since it tries to copy-construct an object from a not-yet-constructed object.

Of course, the way to go would be

Test() : m_str() {}
sbi
+12  A: 

The first constructor is equivalent to

  Test()
  : m_str()
  {
    // members initialized ...
    m_str = m_str;
  }

that is, by the time you get to the assignment within the constructor, m_str has already been implicitly initialized to an empty string. So the assignment to self, although completely meaningless and superfluous, causes no problems (since std::string::operator=(), as any well written assignment operator should, checks for self assignment and does nothing in this case).

However, in the second constructor, you are trying to initialize m_str with itself in the initializer list - at which point it is not yet initialized. So the result is undefined behaviour.

Update: For primitive types, this is still undefined behaviour (resulting in a field with garbage value), but it does not crash (usually - see the comments below for exceptions) because primitive types by definition have no constructors, destructors and contain no pointers to other objects.

Same is true for any type that does not contain pointer members with ownership semantics. std::string is hereby demonstrated to be not one of these :-)

Péter Török
Depends what you mean by "primitive types". Even assignment of indeterminate value to `int` can in principle cause a crash, because the standard allows padding and trap representation for `int`. You have to get down to the `char` types to have portable code with well-defined initialization-from-self.
Alf P. Steinbach
@Alf even for `char` types, such is undefined behavior. Only if you dereference a `char` types pointer it's defined for any and all possible patterns (because it then reads from memory and because `char` types have no traps). See the last note at http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#240 , *"The CWG did not consider that access to an unsigned char might still trap if it is allocated in a register and needs to reevaluate the proposed resolution in that light."*.
Johannes Schaub - litb
@Johannes: thanks, I didn't think of that! (But neither did the committee...) However, I'm sure that it's about the last gasp of "ENIAC-compatibility", and that there's no in-practice issue -- at least, I sincerely hope so! :-)
Alf P. Steinbach
@Alf there is a good article about the ia64's NaT the committee talks about: http://blogs.msdn.com/b/oldnewthing/archive/2004/01/19/60162.aspx
Johannes Schaub - litb
@Alf, @Johannes, thanks for the clarification, I learned something new again :-)
Péter Török
+3  A: 

The original "initialization" by assignment is completely superfluous.

It didn't do any harm, other than wasting processor cycles, because at the time of the assignment the m_str member had already been initialized, by default.

In the second code snippet the default initialization is overridden to use the as-yet-uninitialized member to initialize itself. That's Undefined Behavior. And it's completely unnecessary: just remove that (and don't re-introduce the original time-waster, just, remove).

By turning up the warning level of your compiler you may be able to get warnings about this and similar trivially ungood code.

Unfortunately the problem you're having is not this technical one, it's much more fundamental. It's like a worker in a car factory poses a question about the square wheels they're putting on the new car brand. Then the problem isn't that the square wheels don't work, it's that a whole lot of engineers and managers have been involved in the decision to use the fancy looking square wheels and none of them objected -- some of them undoubtedly didn't understand that square wheels don't work, but most of them, I suspect, were simply afraid to say what that they were 100% sure of. So it's most probably a management problem. I'm sorry, but I don't know a fix for that...

Alf P. Steinbach
@Alf: True to some extent in my case as well. I thought I can convince them using the C++ Standard as reference. So awaiting the quote.. I do not have a copy of that / Internet for n3092.pdf either
Abhay
You can download the latest draft of the standard from the C++ committee pages, at [http://www.open-std.org/jtc1/sc22/wg21/].
Alf P. Steinbach