tags:

views:

355

answers:

7

EDIT: Changed example below to one that actually demonstrates the SIOF.

I am trying to understand all of the subtleties of this problem, because it seems to me to be a major hole in the language. I have read that it cannot be prevented by the linker, but why is this so? It seems trivial to prevent in simple cases, like this:

// A.h
extern int x;

// A.cpp
#include <cstdlib>

int x = rand();

// B.cpp
#include "A.h"
#include <iostream>

int y = x;

int main()
{
    std::cout << y; // prints the random value (or garbage)?
}

Here, the linker should be able to easily determine that the initialization code for A.cpp should happen before B.cpp in the linked executable, because B.cpp depends on a symbol defined in A.cpp (and the linker obviously already has to resolve this reference).

So why can't this be generalized to all compilation units. If the linker detects a circular dependency, can't it just fail the link with an error (or perhaps a warning, since it may be the programmer's intent I suppose to define a global symbol in one compilation unit, and initialize it in another)?

Does the standard levy any requirements on an implementation to ensure the proper initialization order in simple cases? What is an example of a case where this would not be possible?

I understand that an analogous situation can occur at global destruction time. If the programmer does not carefully ensure that the dependencies during destruction are symmetrical to construction, a similar problem occurs. Could the linker not warn about this scenario as well?

+2  A: 

It's because static initialization is a completely different animal than runtime initialization. The initialization of x is—by its nature in your example—dynamic. But it is written as a static initialization. This comes mostly from compatibility with decades of C practice.

One way of resolving such a construct is to compiling initialization code for each module which runs before main(), like #pragma startup does in some implementations.

But really, how often does the declaration module not know what the initialization values are?

wallyk
It's not initialized statically: it's an external symbol, therefore it has to be initialized at runtime.
wallyk
Modules allocating storage for a symbol can have initialization data—no runtime effort needed, a tried-and-true feature of all object module languages. If another module declares the symbol `extern` *and* initializes it, something has to happen at runtime—or at the very least—rare and tricky object module mechanisms are needed. The OpenVMS object module format has such a mechanism. Implementations using PSECTs with a *common* flag (for example, to implement Fortran `common` sections) could for each variable—effectively ignoring `extern`. But this is not a universal OMF feature.
wallyk
A: 

Traditional linkers are not looking at source code or even ASTs, and existing object file formats provide fairly minimal information about exported and external symbols.

dmckee
Stingray, the linker's "minimal information" is that it knows that object file B.o references a symbol named `x`. It doesn't know that that value stored in `x` is used to initialize `y`, and thus it doesn't know that `x` needs to be assigned before `y`.
Rob Kennedy
Well, yes, in your simple example. But the simple examples are rarely the interesting ones. See my answer for more.
Rob Kennedy
In fact, solving only the easy cases is a bad idea. Because the result is that the linker will give up at a rather arbitrary point, where the developer won't understand the cause.
MSalters
+2  A: 

In theory, there's nothing preventing a linker from handling this -- basically do a topological sort among the dependencies to come up with an initialization order. Existing linkers don't do it though, and C++ mostly depends on existing linkers...

Edit: From the viewpoint of the standard, the solution to this problem is utterly trivial: one sentence to require that all objects with static storage duration are initialized prior to main() beginning execution. Unfortunately, about all that would accomplish is raising another area in which virtually nobody conforms with the standard, or (worse) even has a plan to do so. For it to mean anything, the implementers on the committee have to agree that it's sufficiently important that they're going to implement it.

You're right that it's easy to look around and see that people have problems with this. At the same time, I don't know of a single vendor who seems to consider it a real problem. None of them seems to have worked on it yet. None of them has it scheduled for a future release. As far as I can see, it hasn't even made it onto anybody's "it would be nice if we could someday" list.

That brings us back to what I originally said: even though it may look like a serious problem to us as users, it apparently doesn't look that way to most implementers. I can see a number of reasons that might be so. First, of course, is that C++ isn't a key item in anybody's corporate agenda. Microsoft pushes .NET. Sun/Oracle and IBM push Java. Others have their own agendas, but none of them is trying to get you to use C++. It looks to me like most of them consider it a necessary evil, not something to which they really want to devote any effort at all. That being the case, working on completely re-designing the guts of their linker to handle this particular problem would probably only even be open to consideration if they got a lot of complaints about it. That as two problems. First of all, C++ starts out as a fairly small community, so it would take a huge percentage of them before implementers really noticed anything they said. Second, only a fairly small percentage of C++ programmers really run into problems with this anyway. About the only reason they'd bother or care would be if it became an issue for their own, internal development. Unfortunately, most have little reason to care about portability.

Jerry Coffin
And yet C++ expects repeated template definitions appearing in multiple translation units to somehow get resolved... not to mention the fact that C linkers didn't need to allow symbol names to be hundreds of characters long, which routinely happens with decoration and template expansion in C++. Sometimes C++ pushes the linker around, sometimes "linker says no".
Daniel Earwicker
*'"depending on existing linkers" is no excuse'* This is not the first one of these questions that you have asked. If this is such a wonderful idea, implement it: it'll make you reputation in the wider world. Heck, the GNU linker is Free so you don't even have to re-implement all the basic functionality.
dmckee
@STingRaySC: Sure they've been modified -- just not in this way. It would appear that relatively few people find it enough of a problem to bother writing the code or, apparently, even thinking about it much. It hasn't been done, and nobody seems to be promising it for their next version or anything like that either.
Jerry Coffin
"one sentence to require that all objects with static storage duration are initialized prior to main() beginning execution" - how does that solve the static initialization *order* fiasco? The problem isn't that things are initialized after `main()` starts, the problem is that they're initialized before their dependencies. Also, how would dynamic library loading work if statics had to be initialized before the name of the library was even introduced to the program?
Steve Jessop
@Steve:perhaps I over-simplified, but you get the idea -- it would be fairly easy to state the requirement. The problem is in getting people to actually implement it. I thought of mentioning DLLs/SOs, but the standard seems to ignore them anyway. Unless an equivalent of LoadLibrary/dlopen (and company) were standardized, everything using them goes outside the standard anyway...
Jerry Coffin
@Jerry: currently C++ doesn't specify dlopen (or equivalent), but I think it's deliberately designed to permit dlopen, hence statics only have to be initialized before code in their TU is executed. Applying too blunt an instrument to solving the fiasco risks forbidding something which implementations currently have good reasons to do (beyond "it's hard to implement"), dlopen was intended as a particularly important example :-)
Steve Jessop
A: 

While the linker could perhaps do that, most examples of where you would need it to do so are also examples of bad code lacking cohesion and having high coupling (usually through the horror of global variables). Your example being such an exemplar.

So it is hardly a "fiasco"; that is probably too strong a description. It is merely a minor restriction of the way you might code.

Clifford
@Clifford: "Static initialization order fiasco" is the common term for this. http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.13
jamesdlin
+5  A: 

Linkers traditionally just link - i.e. they resolve addresses. You seem to be wanting them to do semantic analysis of the code. But they don't have access to semantic information - only a bunch of object code. Modern linkers at least can handle large symbol names and discard duplicate symbols to make templates more useable, but so long as linkers and compilers are independent, that's about it. Of course if both linker and compiler are developed by the same team, and if that team is a big corporation, more intelligence can be put in the linker, but it's hard to see how a standard for a portable language can mandate such a thing.

If you want to know more about linkers, BTW, take a look at http://www.iecc.com/linker/ - about the only book on an often ignored tool.

anon
A: 

In your simple example, a sufficiently smart linker could indeed work out that the initializations in A.o need to run before those in B.o because B.o refers to symbols that are defined in A.o.

But examples as simple as yours don't really demonstrate much of a problem, certainly not something of the "fiasco" level. Here's a slightly more complicated example.

// externs.h
extern int a;
extern int b;

// A.cpp
#include "externs.h"

int a = 5;
int aa = b;

// B.cpp
#include "externs.h"
int b = 10;
int bb = a;

The standard requires that variables in a single compilation unit be initialized in declaration order, so a must be initialized before aa, and b be initialized before bb, but there aren't any further ordering requirements. Initializations from a compilation unit are allowed to be interleaved with those from other compilation units.

There is at least one initialization order that would ensure all variables are initialized before they get used to initialize anything else, while still obeying the standard:

  1. a
  2. b
  3. bb
  4. aa

The linker has only limited information about this program. It knows that the compiled file A.o defines two symbols, a and aa, and that it refers to an external symbol b. Likewise, it knows that B.o defines b and bb and refers to external symbol a. The two object files are mutually dependent, so the linker cannot use the same technique it could have used from your example. In this example, it needs to know that only a has to be defined in order to initialize B.o. The information recorded in the object files, though, doesn't get that specific. It doesn't contain dependencies between symbols.

Rob Kennedy
A: 

Any language standard is a compromise among many things. In this case, we're talking about a compromise between ease of implementation and ease of use. If a language is too hard to implement, there will be few or no conforming implementations, and the standard will be useless. If it's too hard to use, nobody will use it, and the standard also will be useless.

Language standard committees will therefore try to limit the demands they place on the implementation, particularly on the more common systems. In modern systems, it's very common to have various different compilers but a shared linker, and therefore a committee will feel much freer to make demands on the compiler writers but go easier on the linkers.

C++ function overloading depended on finding a trick to make it work on linkers ("name mangling"). The C90 standard said that variable names with external linkage had to be unique in the first six characters without counting different cases. The rationale (to the 1989 ANSI version, it was, IIRC, dropped for the 1990 ISO standard) said that the committee was very unhappy about keeping that restriction, but felt that dropping it would make it too difficult to implement standard C on too many systems with primitive linkers.

There is something of a chicken-and-egg situation here, in that language designers are reluctant to put demands on linkers, and therefore there's no great push for linkers to evolve, but that's the way things are currently working.

David Thornley