Possible Duplicate:
Why global and static variables are initialized to their default values?

What is the technical reason this happens? And is it supported by the standard across all platforms? Is it possible that certain implementations may return undefined variables if static variables aren't explicitly initialized?


Mostly because the static variables are grouped together in one block by the linker, so it's real easy to just memset() the whole block to 0 on startup. I to not believe that is required by the C or C++ Standards.

James Curran
It's not `memset` to 0 except on DOS and other [non-]operating systems. On a modern system, it's a copy-on-write reference to the "zero page".

There is discussion about this here:

First of all in ISO C (ANSI C), all static and global variables must be initialized before the program starts. If the programmer didn't do this explicitly, then the compiler must set them to zero. If the compiler doesn't do this, it doesn't follow ISO C. Exactly how the variables are initialized is however unspecified by the standard.

John at CashCommons
+11  A: 

It is required by the standard (§6.7.8/10).

There's no technical reason it would have to be this way, but it's been that way for long enough that the standard committee made it a requirement.

Leaving out this requirement would make working with static variables somewhat more difficult in many (most?) cases. In particular, you often have some one-time initialization to do, and need a dependable starting state so you know whether a particular variable has been initialized yet or not. For example:

int foo() { 
    static int *ptr;

    if (NULL == ptr)
       // initialize it

If ptr could contain an arbitrary value at startup, you'd have to explicitly initialize it to NULL to be able to recognize whether you'd done your one-time initialization yet or not.

Jerry Coffin
+1 for the standards link. The original mandate of the ANSI C committee was to codify existing practice which is most likely why it was made a requirement.
@paxdiablo: Undoubtedly -- it's explicitly stated in K automatic and register variables which are not initialized are guaranteed to start off as garbage."
Jerry Coffin

Take a look : here 6.2.4(3) and 6.7.8 (10)

+2  A: 

Yes, it's because it's in the standard; but really, it's because it's free. Static variables look just like global variables to the generated object code. They're allocated in .bss and initialized at load time along with all your constants and other globals. Since the section of memory where they live is just copied straight from your executable, they're initialized to a value known at compile-time for free. The value that was chosen is 0.

The .bss section in ELF files is zero length. It is the loader that is responsible zero initializing the block. If this was not required by the standard, it could just leave the block uninitialized.
An uninitialized page at process startup would be all-zero anyway, unless the kernel goes out of its way to write random junk. On any modern implementation, bss is a COW (copy-on-write) reference to the "zero page" that's shared among all processes. Even if it weren't, the kernel would still have to do something to prevent new processes from seeing the contents of random physical memory (which might contain private internal kernel data or data from other users) so they might as well set it to a useful value like 0 when clearing it...

Suppose you were writing a C compiler. You expect that some static variables are going to have initial values, so those values must appear somewhere in the executable file that your compiler is going to create. Now when the output program is run, the entire executable file is loaded into memory. Part of the initialization of the program is to create the static variables, so all those initial values must be copied to their final static variable destinations.

Or do they? Once the program starts, the initial values of the variables are not needed anymore. Can't the variables themselves be located within the executable code itself? Then there is no need to copy the values over. The static variables could live within a block that was in the original executable file, and no initialization at all has to be done for them.

If that is the case, then why would you want to make a special case for uninitialized static variables? Why not just put a bunch of zeros in the executable file to represent the uninitialized static variables? That would trade some space for a little time and a lot less complexity.

I don't know if any C compiler actually behaves in this way, but I suspect the option of doing things this way might have driven the design of the language.

Jeffrey L Whitledge
Some old C compilers might have put initialized variables within the code image and accessed them from there. I know Turbo Pascal did that. I see no reason to put uninitialized variables there. Filling a block of memory with zero is not hard.
@supercat - Filling a block of memory with zero is not hard at all. But segragating static variables into Initialized and Uninitialized categories and allocating separate memory blocks for them—while also not hard—is unnecessary.
Jeffrey L Whitledge
It's "necessary" for saving space, both on disk and in memory. If you call that "unnecessary" then you should stop coding...
@R.. - Oh, yeah? Maybe, *you* should stop coding. :-P
Jeffrey L Whitledge
@R.. - All kidding aside, how exactly does waiting until execution begins to allocate static memory locations save space "in memory"?
Jeffrey L Whitledge
Suppose the program (or more likely a library it uses) has a large (>4kb) static buffer that's zero-initialized but rarely (in the program itself) or never used (in the case of an unused part of the library). If you memset it to 0, it still uses memory. If it's part of bss, it's just a tiny page table entry in the kernel for one or more COW copies of the zero page.
Moreover, even if it is used, it might very well be sparse. Imagine a ~8mb table that maps Unicode characters to 64bit pointers but that's only filled in for characters which have been seen. For most input data, the majority of the table will simply be zero-page "copies" and thus not use any memory (aside from tiny page-table entries).
@R.. - None of the examples you've given address the point. If the language definition says that the uninitialized static variables must be assigned to zero, then that means it's been written to at some point. Therefore, sparseness isn't an issue, unused memory pages are not an issue. Also, the C language spec was written in a time when computer memory was sometimes measured in kilobytes, and many of those systems (especially PCs, for example) were years away from doing any sort of memory paging. Remember that C is more than a decade older than Unicode.
Jeffrey L Whitledge
@R.. - I wasn't saying that this is the way all C compilers work. I wasn't saying this is how they should work. I'm not even saying that any C compiler actually did work this way. But it does seem reasonable to me that some C compilers might have worked this way back when the spec was written; when memory was a premium, processors were slow, and the act of combining program loading with static variable initialization might have been a major optimization. To claim that doing it that way is necessarily always a waste of memory back when C spec was designed is absurd.
Jeffrey L Whitledge
"Assigned" is in the formal specification sense, not the sense of a machine-code store instruction. I agree that your original answer reflects some aspects of history, but as written I took it to be a potential contemporary compiler design thought process, and in a contemporary context it would be very misguided.

Of course there is no arguing that it is in the C standards. So expect a compliant compiler to behave that way.

The technical reason behind why it was done might be rooted in how the C startup code works. There are usually several memory segments the linker has to put compiler output into including a code (text) segment, a block storage segment, and an initialized variable segment.

Non-static function variables don't have physical storage until the scope of the function is created at runtime so the linker doesn't do anything with those.

Program code of course goes in the code (or text) segment but so do the values used to initialize global and static variables. Initialized variables themselves (i.e. their addresses) go in the initialized memory segment. Uninitialized global and static variables go in the block storage (bss) segment.

When the program is loaded at execution time, a small piece of code creates the C runtime environment. In ROM based systems it will copy the value of initialized variables from the code (text) segment into their respective actual addresses in RAM. RAM (i.e. disk) based systems can load the initial values directly to the final RAM addresses.

The CRT (C runtime) also zeroes out the bss which contains all the global and static variables that have no initializers. This was probably done as a precaution against uninitialized data. It is a relatively straightforward block fill operation because all the global and static variables have been crammed together into one address segment.

Of course floats and doubles may require special handling because their 0.0 value may not be all zero bits if the floating format is not IEEE 754.

Note that since autovariables don't exist at program load time they can't be initialized by the runtime startup code.

The IEEE 754 floating point standard is specifically designed so that +0 is all zero bits. Have you seen an implementation of C that uses another representation of floating point values for which this isn't true?
Jeffrey L Whitledge
@Jeffrey: I've used compilers for DSP specific float formats like TI. But now I'm not 100% certain if they also had that property.
@Jeffrey: No, the TI format does not encode 0.0 as all zero bits. It requires an exponent of -128 to uniquely identify 0.0.
@Amardeep - Oh, interesting.
Jeffrey L Whitledge