views:

247

answers:

7

Hi,

It seems that the JVM uses some fixed amount of memory. At least I have often seen parameters -Xmx (for the maximum size) and -Xms (for the initial size) which suggest that.

I got the feeling that Java applications don't handle memory very well. Some things I have noticed:

  • Even some very small sample demo applications load huge amounts of memory. Maybe this is because of the Java library which is loaded. But why is it needed to load the library for each Java instance? (It seems that way because multiple small applications linearly take more memory. See here for some details where I describe this problem.) Or why is it done that way?

  • Big Java applications like Eclipse often crash with some OutOfMemory exception. This was always strange because there was still plenty of memory available on my system. Often, they consume more and more memory over runtime. I'm not sure if they have some memory leaks or if this is because of fragmentation in the memory pool -- I got the feeling that the latter is the case.

  • The Java library seem to require much more memory than similar powerful libraries like Qt for example. Why is this? (To compare, start some Qt applications and look at their memory usage and start some Java apps.)

Why doesn't it use just the underlying system technics like malloc and free? Or if they don't like the libc implementation, they could use jemalloc (like in FreeBSD and Firefox) which seems to be quite good. I am quite sure that this would perform better than the JVM memory pool. And not only perform better, also require less memory, esp. for small applications.


Addition: Does somebody have tried that already? I would be much interested in a LLVM based JIT-compiler for Java which just uses malloc/free for memory handling.

Or maybe this also differs from JVM implementation to implementation? I have used mostly the Sun JVM.

(Also note: I'm not directly speaking about the GC here. The GC is only responsible to calculate what objects can be deleted and to initialize the memory freeing but the actual freeing is a different subsystem. Afaik, it is some own memory pool implementation, not just a call to free.)


Edit: A very related question: Why does the (Sun) JVM have a fixed upper limit for memory usage? Or to put it differently: Why does JVM handle memory allocations differently than native applications?

+2  A: 

Java does use malloc and free, or at least the implementations of the JVM may. But since Java tracks allocations and garbage collects unreachable objects, they are definitely not enough.

As for the rest of your text, I'm not sure if there's a question there.

Pontus Gagge
Of course at the most lower level, they must allocate somehow memory from the operating system and `mallo`/`free` is the first choice here. But that is not what I meant. They have wrapped everything into an additional layer and they use some memory pool afaik. Or what exactly happens when I do a `new Object()` in Java? I don't think it is a simple `malloc`. But please correct me because that exactly is what I was thinking and asking.
Albert
@Albert: It can't be as simple as a malloc because the VM has to manage the heap and detect what can be GC'd. Also, most of the things you listed seem to be pointing to a lack of understanding the purpose and function of a VM. Nothing wrong with that, but you might want do do some googling because you've actually asked a very broad question that requires a decent understanding of the fundamentals of VM's and managed code.
j flemm
@j flemm: I know the purpose of a VM but that is mostly also what I am asking here: Why is such a VM needed at all? Just a simple JIT compiler system would be enough and if the user wishes, he could use some sandbox around that.
Albert
@Albert: But then why would you even use Java? It's a round peg for a square hole when it comes to things like extremely small applications or cases where you don't need/want your heap managed. It'd be like asking why C++ is too dumb to find and free all your dead instances: it's the wrong tool for the job.
j flemm
@j flemm: You again mix up the GC and the memory management. :) I like having a GC. But I just want that it uses the OS memory management instead of its own memory pool. This would probably be faster and I just would feel more comfortable.
Albert
@Albert: Out of curiosity, why would the JVM using OS memory management as opposed to its own memory managament make you feel more comfortable?
Poindexter
@Poindexter: Because I think that the OS has better knowledge about the overall existing physical memory and the virtual memory management than the JVM.
Albert
@Albert: So by "memory pool" you mean the virtual memory it already has? And you want it to shuffle it back and forth from the OS to the VM instead of being intelligent with it?
j flemm
@j flemm: The OS is already intelligent with it. It is basically that the JVM memory manager tries to do the same thing what the OS is also already doing at a lower level. What I mean is that it seems to me as the JVM introduces an additional obsolete layer here.
Albert
@Albert: The OS is as intelligent _as it can be_ about it. But the OS has little idea of optimizations it can make specific to the executing program while the VM does know this since it's refining the executing code constantly.
j flemm
@j flemm: What optimisations does the JVM do on memory allocations? How does it compare to glibc's allocator or to jemalloc? This would finally answer what I was asking for initially. :)
Albert
@Albert: Check out Tim's answer. I think it may answer some of your questions about the optimizations. But, at the end of the day, you should profile it and prove it to yourself if it doesn't jive with your intuition.
j flemm
+2  A: 

Even some very small sample demo applications load huge amounts of memory. Maybe this is because of the Java library which is loaded. But why is it needed to load the library for each Java instance? (It seems that way because multiple small applications linearly take more memory. See here for some details where I describe this problem.) Or why is it done that way?

That's likely due to the overhead of starting and running the JVM

Big Java applications like Eclipse often crash with some OutOfMemory exception. This was always strange because there was still plenty of memory available on my system. Often, they consume more and more memory over runtime. I'm not sure if they have some memory leaks or if this is because of fragmentation in the memory pool -- I got the feeling that the latter is the case.

I'm not entirely sure what you mean by "often crash," as I don't think this has happened to me in quite a long time. If it is, it's likely due to the "maximum size" setting you mentioned earlier.

Your main question asking why Java doesn't use malloc and free comes down to a matter of target market. Java was designed to eliminate the headache of memory management from the developer. Java's garbage collector does a reasonably good job of freeing up memory when it can be freed, but Java isn't meant to rival C++ in situations with memory restrictions. Java does what it was intended to do (remove developer level memory management) well, and the JVM picks up the responsibility well enough that it's good enough for most applications.

Dave McClelland
I'm not directly speaking about the GC (which determines at what point which objects can be deleted) but about the memory management (how memory (de)allocations are handled).
Albert
+2  A: 

To answer a portion of your question;

Java at start-up allocates a "heap" of memory, or a fixed size block (the -Xms parameter). It doesn't actually use all this memory right off the bat, but it tells the OS "I want this much memory". Then as you create objects and do work in the Java environment, it puts the created objects into this heap of pre-allocated memory. If that block of memory gets full then it will request a little more memory from the OS, up until the "max heap size" (the -Xmx parameter) is reached.

Once that max size is reached, Java will no longer request more RAM from the OS, even if there is a lot free. If you try to create more objects, there is no heap space left, and you will get an OutOfMemory exception. Now if you are looking at Windows Task Manager or something like that, you'll see "java.exe" using X megs of memory. That sort-of corresponds to the amount of memory that it has requested for the heap, not really the amount of memory inside the heap thats used.

In other words, I could write the application:

class myfirstjavaprog
{  
    public static void main(String args[])
    {
       System.out.println("Hello World!");
    }
}

Which would basically take very little memory. But if I ran it with the cmd line:

java.exe myfirstjavaprog -Xms 1024M

then on startup java will immediately ask the OS for 1,024 MB of ram, and thats what will show in Windows Task Manager. In actuallity, that ram isnt being used, but java reserved it for later use.

Conversely, if I had an app that tried to create a 10,000 byte large array:

class myfirstjavaprog
{  
    public static void main(String args[])
    {
       byte[] myArray = new byte[10000];
    }
}

but ran it with the command line:

java.exe myfirstjavaprog -Xms 100 -Xmx 100

Then Java could only alocate up to 100 bytes of memory. Since a 10,000 byte array won't fit into a 100 byte heap, that would throw an OutOfMemory exception, even though the OS has plenty of RAM.

I hope that makes sense...


Edit:

Going back to "why Java uses so much memory"; why do you think its using a lot of memory? If you are looking at what the OS reports, then that isn't what its actually using, its only what its reserved for use. If you want to know what java has actually used, then you can do a heap dump and explore every object in the heap and see how much memory its using.

To answer "why doesn't it just let the OS handle it?", well I guess that is just a fundamental Java question for those that designed it. The way I look at it; Java runs in the JVM, which is a virtual machine. If you create a VMWare instance or just about any other "virtualization" of a system, you usually have to specify how much memory that virtual system will/can consume. I consider the JVM to be similar. Also, this abstracted memory model lets the JVM's for different OSes all act in a similar way. So for example Linux and Windows have different RAM allocation models, but the JVM can abstract that away and follow the same memory usage for the different OSes.

rally25rs
Yea sure I understand the implications of these parameters. :) But that was not really my question. My question basically was: Why not let the OS take care of this. It does basically already the same thing.
Albert
@Albert: Set -Xmx to the system max and the system will (more or less) take care of it by giving it a lot of space to page into that it will probably never use. They could have set it higher by default if that's the question. Nothing stopping you from doing it on your system though.
j flemm
@j flemm: Yes, this is one thing. This workaround would resolve the limit in the JVM. But the question was not really about how to workaround this limit but about why it is there. That was one part of my question. And the other part is why Java uses so much memory.
Albert
@Albert: see my added section to the answer above.
rally25rs
+5  A: 

Java runs inside a Virtual Machine, which constrains many parts of it's behavior. Note the term "Virtual Machine." It is literally running as though the machine is a separate entity, and the underlying machine/OS are simply resources. The -Xmx value is defining the maximum amount of memory that the VM will have, while the -Xms defines the starting memory available to the application.

The VM is a product of the binary being system agnostic - this was a solution used to allow the byte code to execute wherever. This is similar to an emulator - say for old gaming systems. It is emulating the "machine" that the game runs on.

The reason why you run into an OutOfMemoryException is because the Virtual Machine has hit the -Xmx limit - it has literally run out of memory.

As far as smaller programs go, they will often require a larger percentage of their memory for the VM. Also, Java has a default starting -Xmx and -Xms (I don't remember what they are right now) that it will always start with. The overhead of the VM and the libraries becomes much less noticable when you start to build and run "real" applications.

The memory argument related to QT and the like is true, but is not the whole story. While it uses more memory than some of those, those are compiled for specific architectures. It has been a while since I have used QT or similar libraries, but I remember the memory management not being very robust, and memory leaks are still common today in C/C++ programs. The nice thing about Garbage Collection is that it removes many of the common "gotchas" that cause memory leaks. (Note: Not all of them. It is still very possible to leak memory in Java, just a bit harder).

Hope this helps clear up some of the confusion you may have been having.

aperkins
As @rally25rs mentioned, the memory actually being used is not necessarily all being used by the VM, it is simply "reserved" for use - i.e. the virtual machine has taken it aside from the OS, but it is not all necessarily being used inside the VM. Also, as the VM gains memory, it very seldom gives it back, so it often holds on to the maximum memory it ever uses during execution (which is one reason for the -Xmx, to help reduce the chance it will kill the entire system with a memory overflow.)
aperkins
What do you understand exactly under VM? About the binary being system agnostic: A simple JIT compiler or interpreter would be enough. This doesn't really explain why it uses it own memory pool and not just the OS methods. For example, I could also compile C/C++ to the LLVM IR (which is system agnostic) and then later on compile it via some JIT compiler to native code. The performance of the end-result would just be the same.
Albert
You miss the point of Java, it's "Compile once, run everywhere".
Andrew
@Albert: The VM IS a JIT... sort of. The point of the VM is that you never have to worry about the underlying architecture or hardware - it simply handles that behavior for you. This means that it can be running on a heavy end server class machine, or a lightweight netbook, and you don't have to rewrite the code, or worry about memory differences, or the fact that one is little or big endian... It also allows for easy garbage collection, translation to mobile devices, and the like.
aperkins
@aperkins: The JVM is much more than just a JIT compiler. Basically, that is my original question put in different words: Why all this overhead and not just a JITc + GC?
Albert
Please note that this behaviour and these properties are _not_ common to all JVM's. Just the Sun implementation.
Thorbjørn Ravn Andersen
@Albert: This is a fairly detailed and complex question - I have given you the overview. For more information, it would require a large amount of discussion on differences on computer architectures, RISC vs. CISC architectures, what the bytecode is attempting to do, etc.
aperkins
A: 

Java doesn't use a fixed size of memory it is always in the range from -Xms to -Xmx.

If Eclipse crashes with OutOfMemoryError, than it required more memory than granted by -Xmx (a coniguration issue).

Java must not use malloc/free (for object creation) since its memory handling is much different due to garbage collection (GC). GC removes automatically unused objects, which is a benefit compared to be responsible for memory management.

For details on this complex topic see Tuning Garbage Collection

stacker
What happens when the GC removes an unused object? (`free`?) What happens on `new Object()`? (`malloc`?) How does the JVM handle both these things?
Albert
@Albert yes, simplyfied it allocates object on the heap similiar to malloc, but free works as you already know different, rather to compare to c++ smart pointers, GC performs something like free to release the memory, but not as prdictable as in c.
stacker
Yea, but how *exactly* does it handle it? It doesn't seem to just use `malloc` at that level, it seems to use some own memory allocator. Exactly that is the thing I am interested in, nothing else.
Albert
+1  A: 

The limits are a deliberate design decision from Sun. I've seen at least two other JVM's which does not have this design - the Microsoft one and the IBM one for their non-pc AS/400 systems. Both grows as needed using as much memory as needed.

Thorbjørn Ravn Andersen
+5  A: 

You need to keep in mind that the Garbage Collector does a lot more than just collecting unreachable objects. It also optimizes the heap space and keeps track of exactly where there is memory available to allocate for the creation of new objects.

Knowing immediately where there is free memory makes the allocation of new objects into the young generation efficient, and prevents the need to run back and forth to the underlying OS. The JIT compiler also optimizes such allocations away from the JVM layer, according to Sun's Jon Masamitsu:

Fast-path allocation does not call into the JVM to allocate an object. The JIT compilers know how to allocate out of the young generation and code for an allocation is generated in-line for object allocation. The interpreter also knows how to do the allocation without making a call to the VM.

Note that the JVM goes to great lengths to try to get large contiguous memory blocks as well, which likely have their own performance benefits (See "The Cost of Missing the Cache"). I imagine calls to malloc (or the alternatives) have a limited likelihood of providing contiguous memory across calls, but maybe I missed something there.

Additionally, by maintaining the memory itself, the Garbage Collector can make allocation optimizations based on usage and access patterns. Now, I have no idea to what extent it does this, but given that there's a registered Sun patent for this concept, I imagine they've done something with it.

Keeping these memory blocks allocated also provides a safeguard for the Java program. Since the garbage collection is hidden from the programmer, they can't tell the JVM "No, keep that memory; I'm done with these objects, but I'll need the space for new ones." By keeping the memory, the GC doesn't risk giving up memory it won't be able to get back. Naturally, you can always get an OutOfMemoryException either way, but it seems more reasonable not to needlessly give memory back to the operating system every time you're done with an object, since you already went to the trouble to get it for yourself.

All of that aside, I'll try to directly address a few of your comments:

Often, they consume more and more memory over runtime.

Assuming that this isn't just what the program is doing (for whatever reason, maybe it has a leak, maybe it has to keep track of an increasing amount of data), I imagine that it has to do with the free hash space ratio defaults set by the (Sun/Oracle) JVM. The default value for -XX:MinHeapFreeRatio is 40%, while -XX:MaxHeapFreeRatio is 70%. This means that any time there is only 40% of the heap space remaining, the heap will be resized by claiming more memory from the operating system (provided that this won't exceed -Xmx). Conversely, it will only* free heap memory back to the operating system if the free space exceeds 70%.

Consider what happens if I run a memory-intensive operation in Eclipse; profiling, for example. My memory consumption will shoot up, resizing the heap (likely multiple times) along the way. Once I'm done, the memory requirement falls back down, but it likely won't drop so far that 70% of the heap is free. That means that there's now a lot of underutilized space allocated that the JVM has no intention of releasing. This is a major drawback, but you might be able to work around it by customizing the percentages to your situation. To get a better picture of this, you really should profile your application so you can see the utilized versus allocated heap space. I personally use YourKit, but there are many good alternatives to choose from.

*I don't know if this is actually the only time and how this is observed from the perspective of the OS, but the documentation says it's the "maximum percentage of heap free after GC to avoid shrinking," which seems to suggest that.

Even some very small sample demo applications load huge amounts of memory.

I guess this depends on what kind of applications they are. I feel that Java GUI applications run memory-heavy, but I don't have any evidence one way or another. Did you have a specific example that we could look at?

But why is it needed to load the library for each Java instance?

Well, how would you handle loading multiple Java applications if not creating new JVM processes? The isolation of the processes is a good thing, which means independent loading. I don't think that's so uncommon for processes in general, though.

As a final note, the slow start times you asked about in another question likely come from several intial heap reallocations necessary to get to the baseline application memory requirement (due to -Xms and -XX:MinHeapFreeRatio), depending what the default values are with your JVM.

Tim Stone