views:

1527

answers:

14

We ship a Java application whose memory demand can vary quite a lot depending on the size of the data it is processing. If you don't set the max VM (virtual memory) size, quite often the JVM quits with an GC failure on big data.

What we'd like to see, is the JVM requesting more memory, as GC fails to provide enough, until the total available VM is exhausted. e.g., start with 128Mb, and increase geometrically (or some other step) whenever the GC failed.

The JVM ("Java") command line allows explicit setting of max VM sizes (various -Xm* commands), and you'd think that would be designed to be adequate. We try to do this in a .cmd file that we ship with the application. But if you pick any specific number, you get one of two bad behaviors: 1) if your number is small enough to work on most target systems (e.g., 1Gb), it isn't big enough for big data, or 2) if you make it very large, the JVM refuses to run on those systems whose actual VM is smaller than specified.

How does one set up Java to use the available VM when needed, without knowing that number in advance, and without grabbing it all on startup?

A: 

There is two options in the virtual machine arguments that can be used : -Xms to set the memory size at startup and -Xmx to set the maximum memory size...

You can set a low startup memory and a big maximum one, so the VM will allocate new memory only if needed.

Vinze
I *think* the problem is that if you set -Xmx large enough, then it's greater than the memory available and the JVM exits
Brian Agnew
If you set -Xmx larger that the currently available VM, the JVM chokes on startup and refuses to run. The problem is that the system on which the applicaiton is run isn't ours, and we don't know what that target JVM size is.
Ira Baxter
ok I didn't understood the question... the question is how to create a launch script if you don't know the max available memory.
Vinze
-Xmx only virtually allocates and can be well in excess of physical memory. Why was this response voted down?
Xepoch
+7  A: 

The max VM sizes indeed answer to that need (it sets the max value, but the VM will take only necessary, step by step), but if you need several configurations, besides supplying different "cmd" files, I don't really see a way (though i'll search a bit more)

[edit] How about using a first program/script (or even another java program), which would check the available resources for the system, and then only call your program with the appropriate -Xm, according to what it retrieved from system ? That way it would adapt to machines, even if you don't know them before. Could be an idea...

[second edit] Ok, this has been proposed already by skaffman, my bad.

Gnoupi
Not your bad, if you look at the timestamps your answer arrived sooner than his. However, the discussion seems to be going on around Brian's response, also not different than yours. Thanks for chiming in.
Ira Baxter
+5  A: 

I think you're out of luck :-( The -Xms and -Xmx options don't provide that flexibility.

So I think you will need to wrap your JVM invocation with a script that can determine the maximum amount of memory, and then set -Xmx appropriately (probably a .vbs script using WMI on Windows). Or perhaps it asks the users the first time it's run ?

A bit of a pain, I fear.

Brian Agnew
I can't imagine I'm the only guy on the planet with this problem. Are Sun/IBM truly that witless?
Ira Baxter
The trouble with this is that if they are running other stuff on the machine, they will end up swapping to disk, slowing down their applications enormously
oxbow_lakes
@Ira - why are they *witless*? Why would you expect an application to (paraphrasing) "need 1Gb, but if that's not available, then 500Mb will do"? If you can make do with 500Mb, why ask for a Gig?
oxbow_lakes
There are *so* many issues like this it's untrue. No - you're not the only one. I tend to work in environments where each machine is the same (so this isn't an issue) or I have Java-aware users who are able to modify this parameter themselves. I think the complexity of your ultimate solution depends on your audience/clients :-)
Brian Agnew
@oxbow_lakes. But I'd expect a command-line option to say 'give me 2Gb or max'. That would be very useful. Effectively it's saying 'I may need all the memory you have'. I don't know about the implications re. the memory management within the JVM, I confess.
Brian Agnew
Not if the application requests memory only when the GC exhausts. In this case, either the app has "enough" right now, and its VM footprint stays stable, or if it needs more, it demands more. The alternative is that applications are not slow, but simply don't run. I had that capability before I had VM :-(
Ira Baxter
The request I want to make is, "Give a minimum allocation, then take memory as you need it up to the availble VM size", because I don't know what the data size demand is in advance. And no, I don't have the luck to have Java-aware customers, so asking them to choose is really not an option.
Ira Baxter
@Ira - agreed. To be clear, the VM size you're talking about is the machine image, not the JVM ?
Brian Agnew
@Brian: The VM size of interest is the OS-defined VM size.
Ira Baxter
What the hell is an "OS-defined VM size"?
Michael Borgwardt
@Michael: What you configure the OS to allow for max VM size. I'm suprised that's unclear.
Ira Baxter
@oxbox: *witless* because they havent figured out after 10 years of shipping Java/JVM that this is a real problem?
Ira Baxter
@oxbow_lakes: sorry to fumble your handle. Its late :-{
Ira Baxter
A: 

I don't think you can do what you are trying to do; instead you'll have to ship instructions specific to your customers, their systems and their demands of how they can modify your .cmd file to allow for more memory.

Of course, if your product is aimed at very non-technical users, you may wish to hide this behind some more user-friendly config file. E.g.

# HIGH, MEDIUM, LOW - please change as appropriate. The guidelines are:
#                       * HIGH - users who generate 500 items per day
#                       * MEDIUM - 200-500 items etc
memoryUsage=MEDIUM

or possibly deploy different config files depending on which product option a user specifies when they order the product in the first place.

oxbow_lakes
We added a command line option to our tool to basically pass VM size options through. Most of the customers are clueless about this; they expect their application to work without them getting involved, so this simply results in support phone calls.
Ira Baxter
I don't think it unreasonable for a customer to have to indicate in some way what sort of usage pattern they have to expect: I've modified the answer
oxbow_lakes
@oxbox: You may not think it unreasonable for customer to do this. But the customers do. And if you force them to guess, and they guess wrong, which they will, they just blame you for having a user-hostile tool. I already have this problem.
Ira Baxter
+4  A: 

I think the easiest way to do this would be to launch the JVM via some wrapper application, which would check system resources to determine memory availability, and then launch the JVM with the appropriate -Xmx parameter.

The question then becomes how that wrapper would be written. It may even be possible for the wrapper app to itself be a JVM, although I don't think the API or system properties would expose the necessary information. Maybe a shell script or your choice could get the information.

skaffman
For me the real question is why isn't this one of the standard JVM -XM* options? The problem was obvious to us about 30 seconds after we ran out of memory on the first application run.
Ira Baxter
I can't say I've ever come across the need for this, although I do appreciate the usefulness in some situations.
skaffman
Based on the dicussions, it appears I have to write a wrapper. Probably can't write it in Java. So now I need a non-Java something to run a Java program. That something is probably different under Windows than it is under Linux, etc. So much for "write once, run anywhere", it now "write once, configure differently everywhere". Takes some of the appeal out of using Java.
Ira Baxter
This is actually a JVM option, -XX:+AggressiveHeap does automatically set -XM and more. See my answer to the question for more info.
Sindri Traustason
@Ira: There are many platform-specific JVM options. Noone ever claimed that the JVM itself was cross-platform.
skaffman
@skaffman That's not the point. The mantra for Java is "write once run anywhere". That's what my boss hears. Then he hears we have trouble getting it to run anywhere. [I've been chasing after a VM size cure for several months] What's a reasonable boss to think given that information?
Ira Baxter
It *is* the point. The "write once run anywhere" mantra applies only to the application code, *not* the VM executable.
skaffman
@Skaffman. Only if you don't object to the fine-print game in contracts. (Lest this turn into a flame war, I've said enough).
Ira Baxter
A: 

This discussion is moot if you think that your clients can ask for 2-3GB of RAM on their 32-bit machine. The OS and other apps will be taking their pound of flesh to run as well.

Sounds like your app is reaching the point where it needs a 64-bit operating system and lots more RAM.

duffymo
No. Some users (in fact most) only process small datasets, and are fine with smaller machines. Some have huge datasets. I can't insist they they all have 64 bit systems, and that seems unreasonable if they have only small data.
Ira Baxter
PS: Going 64 bits doesn't change the dumb behavior of the -Xmx option. The JVM still chokes at startup if you don't have a number consistent with the specific machine VM configuration.
Ira Baxter
A: 

I do not think either the Sun or IBM JVM can do this (I know that the AS/400 one can, but that is most likely not relevant to you).

I would suggest using Java WebStart (and before you discard this, then notice that it has been updated with Java 6 u 10 and is much better suited for launching "local" applications and applet) since it allows you to provide a "small instance", "larger instance", "gigantic instance" as links/icons.

You will most likely look into the "inject application in webstart cache" and "offline"options.

Thorbjørn Ravn Andersen
+1  A: 

if you have a lot of time on your hand you could try the following :

Try to obtain what is the needed memory vs input dataset. With this you can split processing in a different set of classes and create a new JVM process to actually process the data. Basically a Manager and a Worker. The Manager would do a basic analysis on the demanded dataset and spawn a Worker with the appropriate memory requirements. You could probably also set your Manager to be aware of the environment and warn the user when they are trying to operate on a dataset their machine cannot handle.

This is pretty much an extension on the answer provided by skaffman but will happen all within the same app as far as the user is concerned.

Newtopian
+2  A: 

You can also use the option: -XX:+AggressiveHeap

This according to the [documentation][1]:

The -XX:+AggressiveHeap option inspects the machine resources (size of memory and number of processors) and attempts to set various parameters to be optimal for long-running, memory allocation-intensive jobs. It was originally intended for machines with large amounts of memory and a large number of CPUs, but in the J2SE platform, version 1.4.1 and later it has shown itself to be useful even on four processor machines. With this option the throughput collector (-XX:+UseParallelGC) is used along with adaptive sizing (-XX:+UseAdaptiveSizePolicy). The physical memory on the machines must be at least 256MB before AggressiveHeap can be used. The size of the initial heap is calculated based on the size of the physical memory and attempts to make maximal use of the physical memory for the heap (i.e., the algorithms attempt to use heaps nearly as large as the total physical memory).

[1]: http://java.sun.com/docs/hotspot/gc1.4.2/#4.2.2. AggressiveHeap|outline

Sindri Traustason
Wow I have never seen these options before... this is great reading, thanks
Newtopian
With this option, the algorithms attempt to use heaps nearly as large as the total physical memory. What happens if the user tries to run some other application? Perhaps the option is useful for server dedicated environments?
zkarthik
Any `-XX` option is liable to change (or disappear) without warning. They are not a *public API* of the JVM and possibly you should avoid (if possible) bundling a public product with them
oxbow_lakes
Indeed -XX:+AggressiveHeap does not seem to be available in 1.5 and later.
Stephen C
@zkarthik I believe this option is really only useful for servers that do pretty much nothing other than run one Java VM. I think physical memory also excludes swap space.
Sindri Traustason
+1  A: 

We have a small C application that we use for launching all of our Java applications via JNI. This allows us to:

  1. Have a meaningful process name (esp important under windows)
  2. Have our own icons (again, important for Windows)
  3. Dynamically build classpaths (we parse out the contents of the /lib file to auto-include all jars)

For our apps, we just hard code the heap limit, but you could easily dynamically configure max heap size based on available memory.

This sort of little app is actually pretty easy to do (it's one of the easiest things to do with JNI). A good starting point would be the source for the JDK (there's a sub-folder for java.exe itself that you can use - that's what we did). Most folks are quite surprised to find that java.exe is a little tiny application (< 200 lines of code) that just invokes JNI and passes command line arguments in (heck, even the use of a method called main() is pretty optional once you start launching things yourself).

If you want, I can post some code (don't have it handy right now) - just add a comment and I'll try to put something together.

Kevin Day
A: 

In comments you say that the amount of memory that your application actually depends on the input dataset size provided by the user. This suggests that instead of trying to grab all available virtual memory (which may cause problems for the user's other applications) you should be looking at the input dataset size before you start the JVM and using that to estimate the amount of memory the application will need.

Suppose that the user's machine is configured with modest physical memory and a huge swap space. If you launch the JVM with a huge VM size, it could cause severe "thrashing" as the JVM tries to access data in non-resident pages. By contrast, if you give the JVM something more than the application needs and less than the available physical memory, you should be able to run comfortably without thrashing.

Stephen C
That wasn't what I proposed. What I want is something that allocates a pretty small first amount of VM, and then grew it each time the garbage collector finished with only a small amount of free space (say, 5%). Then VM demand stays small with small data, and grows to handle big data as required. In fact, AFAIK, it already does this. The failure is that I can't specify the upper bound in a target-machine independent way.
Ira Baxter
I doubt that for any interesting application that it is easy to make an estimate of VM size vs. input size that works reliably. This is the big Oh problem; you might get the O(formula) sort of right, but you need that tuning constant, and if nothing else, different JVMs will have different constants, and different users will have different JVMs. The demand-memory-as-needed scheme is far safer
Ira Baxter
+4  A: 

One more option... I work on a launcher called WinRun4J, which allows you to specify a max heap size as a percentage of the available memory on the machine its running on (ie. it does a check for the amount of memory available and sets the -Xmx parameter dynamically on startup).

The INI option is "vm.heapsize.max.percent". There is also another option "vm.heapsize.preferred", which sets the -Xmx parameter as the maximum available memory on the machine up to this amount.

I believe some of the other launchers (eg. Launch4J, Janel) offer the same functionality.

Peter Smith
http://launch4j.sourceforge.net/ indicates it does, I'm surprised this answer doesn't have more votes it sounds like a perfect solution and it's free...
ShuggyCoUk
A: 

I read through the threads but didn't see anything which indicated that the application had undergone some sort of profiling. Normally I'd profile the apps under certain conditions to find hot spots in performance or memory usage. There's probably things that could be improved in most cases.

If you could establish the limits and understand the behavior of the application you could be in a position to better tell your customers what they can or cannot do with the application thereby reducing the amount of support calls and giving you a better idea of what minimum or maximum heap size to ship the product with.

Maybe you could start with this: http://www.eclipse.org/mat/

pengtuck
Thanks for responding; not much traffic on this question recently. ... While in general I'd agree that (memory) profiling might be useful to ensure that memory is not wasted, we feel this application is pretty well engineered. The real problem is that the amount of data it must read and process for a single user can very literally by 2-3 orders of magnitude with a relatively heavy bias towards smaller sizes. This means the app works well with a modest allocation most of the time, and inevitably runs out of memory when that modest allocation meets the large input...
Ira Baxter
... and because it works well with small, when the out-of-memory problem occurs, the users are so used to it running well that they never bother to learn about allocating stuff themselves. And frankly, while we can (and do) ask them to do this reallocation themselves, the entire problem is an artifact of the dumb way that the VM allocator works. If it simply took more memory when needed, there wouldn't be a problem. So, our users are a victim of a dumb design decision on the part of the JVM implementation, and of course we get take support calls as a consequence.
Ira Baxter
Sounds like you've hit on a hot spot here (the reading part) to me :D. Ask the developers how they are doing the reading and data binding :P.
pengtuck
A: 

Have you looked at running jps to give you the PID for your process and then calling jinfo to change the mx option? Not sure if this will work but it may.

[Edit] This would mean that when you think you have a big dataset, you read the total amount of ram somehow (OS dependent I think. See http://forums.sun.com/thread.jspa?messageID=10306570) or you just increase the size until you don't think it is low anymore (if it blows up first, try to capture and display a helpful message such as "your machine is inadequate, time to make a run to Frys").

David T