views:

1034

answers:

2

Well, I've been given an assignment to basically figure out how memory allocation works for whatever language I'll be using. After some research, I have some questions and doubts which I'd like to get some insight in. For example:

I read here that Java specifies exactly how the stack contents are organized. Looking at the JVM spec structure, it basically says the stack contains frames, and that the frames contain whatever is inside the class by properly allocating the variables and functions. Maybe I am missing something here, but I don't understand how this is any different than what C++ does. I ask because the first link says Java's specification of stack contents avoid compiler incompatibilities.

Also, I have yet to find how the memory segments are exactly organized on top of each other. For example, I know the memory is divided into global variables, call stack, heap and code for C++, yet I don't know if the heap's address is higher than the stack's, or if that depends on the implementation. I also wonder whether a Java program has more than that, and how it would be laid out as well. I imagine there is a standard, since the JVM has to know where it all is to use it, though I suppose it could just have the pointers and leave the rest to the OS. I imagine too, that there must be at least a de facto standard.

Another the thing I don't understand is the runtime constant pool. It's supposed to be "a per-class or per-interface runtime representation of the constant_pool table in a class file", but I don't think I understand what it does. It seems to have a tag to indicate what type the structure in question is? Then the name of the structure (given by the programmer or assigned by the underlying system?) Then it seems the rest of it varies with whatever the tag describes (a thread, an array, etc).

If my interpretation of the runtime constant pool is right, then why are they necessary as well as stack frames? Is it because stack frames only take care of the stack segments, and the runtime constant pool must also have pointers for the heap allocated memory?

+1  A: 

What exactly was the task that you were given?

The main difference between Java and C++ is that Java is garbage collected by the VM, whereas in C++ the program is directly executing on the machine and memory is managed through OS services.

Regarding the stack, a frame is just a more "official" and standard form of what C++ compilers do. C++ compilers just put things on top of each other in the stack as you move from call to call. In Java the term is frame, and because compiled Java code is supposed to run on any platform, there are very clear standards on how that takes place. In C++, each compiler can treat the stack differently (e.g., even by the nature of the word size).

In Java, everything runs within the VM which manages everything, though it delegates some stuff to the environment. In other words, you have no access to where the JVM puts your data and your code, and your code may never even become a real "code segment". In other words, this can't really be answered. In C++, everything works on the hardware, so you will have stack segments, data segments, etc. Look in information about C++.

In C++, classes have no representation in memory at runtime; in fact, you can compile C++ into C and then compile the results into assembly. In Java, everything is also represented at runtime, so you can ask an object what class it belongs to and what method is supports. Hence, each class file has a "constant pool" where the strings representing those things like method names, field names, etc. appear. The actual class definition refers to the pool. So in other words, that has very little to do with stack frames. Stack frames are where method parameters, local variables, and return values are stored.

Uri
The last part about C++ classes having no representation in memory is not entirely true. C++ does support RTTI which is necessary for some things (e.g. dynamic_cast, typeid, and exception handling).
Kevin Loney
I agree. However, this was within my answer regarding the constant pool. AFAIK, RTTI in C++ does not preserve the naming of the source materials.
Uri
Not entirely true consider typeid(x).name(), even if the name is mangled (which is reversible) it's still the source material.
Kevin Loney
+4  A: 

Looking at the JVM spec structure, it basically says the stack contains frames, and that the frames contain whatever is inside the class by properly allocating the variables and functions. Maybe I am missing something here, but I don't understand how this is any different than what C++ does. I ask because the first link says Java's specification of stack contents avoid compiler incompatibilities.

In practice, C++ compilers follow the same basic strategy. However it's not considered a language issue by the Standards committee. Instead, C++ compilers follow this system because that's how most CPUs and operating systems are designed. The different platforms disagree on whether data is passed to functions on a stack or via registers (RISC machines), whether the stack grows up or down, whether there are different calling conventions allowing "normal" calls to use the stack and others to use somethign else (eg., __fastcall and naked), whether there is such as thing as nested functions, tail call support, etc.

In fact, it is possible for a conforming C++ compiler to compile to something like a Scheme VM where "the stack" is much different because Scheme requires implementations to support both tail calls and continuations. I've never seen anything like that, but it would be legal.

The "compiler incompatibilities" are most obvious if you try to write a garbage collector:

all local variables, both for the current function and all its callers, are in ["the" stack, but consider ucontext.h and Windows Fibers]. For each platform (meaning, OS + CPU + compiler) there's a way to find out where ["the" stack] is. Tamarin does that, then it scans all that memory during GC to see where the locals point to. ...

This magic lives in a macro, MMGC_GET_STACK_EXTENTS, defined in the header MMgc/GC.h. ... [T]here’s a separate implementation for each platform.

At any given moment, some locals might be in CPU registers and not on the stack. To cope with this, the macro uses a few lines of assembly code to dump the contents of all the registers onto the stack. That way MMgc can just scan the stack and it’ll see all local variables.


Additionally, objects in Java aren't normally allocated on the stack. Instead references to them are. ints, doubles, booleans, and other primitive types do get allocated on the stack. In C++ anything can be allocated on the stack, which has its own list of pros and cons.

Another the thing I don't understand is the runtime constant pool. It's supposed to be "a per-class or per-interface runtime representation of the constant_pool table in a class file", but I don't think I understand what it does.

Consider:

String s = "Hello World";
int i = "Hello World".length();
int j = 5;

s, i, and j are all variables and can each be changed at some later point in the program. However, "Hello World" is an object of type String that cannot be changed, 5 is an int that cannot be changed, and "Hello World".length() can be determined at compile-time to always return 11. These constants are valid objects, and methods can be called on them (well, at least on the String) so they need to be allocated somewhere. But they cannot be changed, ever. If these constants belong to a class, then they are allocated in a per-class constant pool. Other constant data that is not part of a class (like the ID of the main() thread) is allocated in the per-runtime constant pool ("runtime" in this case meaning "instance of the JVM").

The C++ standard has some language about a similar technique, but the implementation is left up to the binary format (ELF, a.out, COFF, PE, etc.). The Standard expects constants that are integral data types (bool, int, long, etc.) or c-style strings to be actually kept in a constant part of the binary, while other constant data (doubles, floats, classes) might be stored as a variable along with a flag saying that the "variable" is not modifiable (it's also acceptable to store them with integral and c-style string constants, but many binary formats don't make this an option).

Generally speaking, the "constant data section" of a binary can be shared when more than one copy of a program is open at a time (because constant data will be identical in each copy of the program). On ELF this section is called the .rodata section.

Max Lybbert
Nice writeup. Small correction: While normally objects are not allocated on the stack, since Java SE 6 javac can do escape analysis, so it may even allocate objects on the stack. See http://www.ibm.com/developerworks/java/library/j-jtp09275.html. But that should all be transparent to developers anyway :-).
sleske
Edit applied. Thanks.
Max Lybbert