I want to know why exactly static variables in C, C++ and Java are initialized by zero by default? And why this is not true for local variables?
Paragraph 8.5.6 of the C++ standard states that:
"Every object of static storage duration shall be zero-initialized at program startup"
(The standard also says that the initialization of local variables is undefined)
As to why, the standard doesn't say ;) One guess is that it's reasonably easy to implement without any additional downsides.
Speaking for java:
local variables must be initialized before you can access it, because it's a safety gain. The compiler checks for you, if the variable is definitly set.
static or class variables (with an Object type) are initialized with null
, because the compiler can't check if they are initialized at compile time. Instead of letting the program fail if it accesses a non-initialized variable, it will be initialized implicit with null
.
Variables with a native type can't get a null
value, so non-local variables are initialized with 0
or false
, as a fallback. It's not best solution, sure, but I don't know a better one. ;-)
This is just a guess, but it might be the way it is for statics since it's easy to implement, and useful.
The compiler can co-allocate all the variables into one contigous memory area, and then either emit code (a single memset()
call) to clear it before main()
is called. In many cases it can also rely on features of the operating system's executable file format, if that format supports "bss sections", which are cleared by the loader instead. This saves space in the executable, you could have
static unsigned char megabyte[1 << 20];
and the executable would not grow by a megabyte.
For local variables, none of these apply; they are allocated "on the fly" (typically on a stack) and it would be a waste of resources to clear them, since they're typically going to be assigned to very soon anyway.
I have no idea about java and I doubt it's different for statics/locals in java.
As for c and c++, it's about programmers caring about their code effect and loving being in control. Initializing local variables would imply execution of extra code each time program enters the scope. For frequently called functions that may be a disaster.
So to some extent these are just design decisions on the part of the language designers. But the probable reasons for these decisions in Java are:
- for static/member variables, if you're going to initialise them to something, then zero is a convenient value because (a) it's generally a suitable value to mean "not set to anything else special", and is the value you would have picked anyway in some cases such as counters; and (b) internally, it's likely that zero can be used for "special" values, notably to represent null in the case of an object reference.
- for local variables, giving them no default allows for the rule that forces the programmer to set some value before the variable is read, which can actually be useful in allowing the compiler to spot certain errors.
In the case of local variables, it's also conceivable that a local variable could be declared (which at the bytecode/machine code level essentially means allocating stack space/moving the stack pointer) but then never actually written/read in a particular code path. So not having a default avoids doing unnecessary work of setting a default in those cases.
I repeat, though, these are design decisions to some extent. They're essentially a tradeoff between what's likely to be convenient for JVM implementations and convenient for programmers.
N.B. In C/C++, "static" variables mean a different thing to static variables in Java!
Why the static variables are deterministically initialized and local variables aren't?
See how the static variables are implemented. The memory for them is allocated at link time, and the initial value for them is also provided at link time. There is no runtime overhead.
On the other hand, the memory for local variables is allocated at run time. The stack has to grow. You don't know what was there before. If you want, you can clear that memory (zero it), but that would incur a runtime overhead. The C++ philosophy is "you don't pay for things you don't use", so it doesn't zero that memory by default.
OK, but why are static variables initialized to zero, and not some other value?
Well, you generally want to do something with that variable. But then how do you know if it has been initialized? You could create a static boolean variable. But then it also has to be reliably initialized to something (preferably false). How about a pointer? You'd rather want it initialized to NULL than some random garbage. How about a struct/record? It has some other data members inside. It makes sense to initialize all of them to their default values. But for simplicity, if you use the "initialize to 0" strategy, you don't have to inspect the individual members and check their types. You can just initialize the entire memory area to 0.
This is not really a technical requirement. The semantics of initialization could still be considered sane if the default value is something other than 0, but still deterministic. But then, what should that value be? You can quite easily explain why 0 is used (although indeed it sounds slightly arbitrary), but explaining -1 or 1024 seems to be even harder (especially that the variable may not be large enough to hold that value, etc).
And you can always initialize the variable explicitly.
And you always have paragraph 8.5.6 of the C++ standard which says "Every object of static storage duration shall be zero-initialized at program startup".
For more info, please refer to these other questions:
This has to do with the concept of "only pay for what you use" in C/C++.
For static variables, an initialization can be made without generating code. The object file contains the initial values for the variables in the data segment and when the OS loads the executable it loads and maps this data segment before the program starts executing.
For local variables there's no way to initialize them without code because they are not initialized once, they should be initialized every time you enter their scope; also they are allocated in the stack, and when the allocation occurs the initial value in the stack in the general case is simply what was there before (except those rare moments you grow the stack more than it has grown before).
So to implicitly initialize a local variable the compiler would need to generate code without the programmer explicitly commanding it to do so, which is quite against that "philosophy".
About Java, as far as I know, variables are always initialized when the program enters their scope, no matter if they are static or not. The only significant difference between them is that the scope of static variables is the entire program. Given that, the behavior is consistent among all of them.