Avoid memory fragmentation when allocating lots of arrays in Java

views:

383

answers:

+8 Q:

Avoid memory fragmentation when allocating lots of arrays in Java

I am developing an application in Java that runs on Windows Mobile devices. In order to achieve this we have been using the Esmertec JBed JVM, which is not perfect but we are stuck with it for now. Recently we have been getting complaints from customers about OutOfMemoryErrors. After a lot of playing around with things I discovered that the device has plenty of free memory (approx. 4MB).

The OutOfMemoryErrors always occur at the same point in the code and that is when expanding a StringBuffer in order to append some characters to it. After adding some logging around this area I found that my StringBuffer had about 290000 characters in it with a capacity of about 290500. The expansion strategy of the internal character array is simply to double the size, so it would be attempting to allocate an array of about 580000 characters. I printed out the memory usage around this time too and found that it was using about 3.8MB of about 6.8MB total (although I have seen the total available memory rise to around 12MB at times, so there is plenty of room for expansion). So it is at this point that the application reports an OutOfMemoryError, which doesn't make much sense given how much there is still free.

I started thinking about the operation of the application up to this point. Basically what is happening is I am parsing an XML file using MinML (a small XML Sax Parser). One of the fields in the XML has about 300k characters in it. The parser streams the data from disk and by default it loads only 256 characters at a time. So when it reaches the field in question the parser will call the 'characters()' method of the handler over 1000 times. Each time it will create a new char[] holding 256 characters. The handler simply appends these characters to a StringBuffer. The default initial size of the StringBuffer is only 12, so as the characters are appended to the buffer it is going to have to grow a number of times (each time creating a new char[]).

My assumption from this was that it is possible that while there is enough free memory since the previous char[]s can be garbage collected, maybe there is no contiguous memory block big enough to fit the new array I am trying to allocate. And maybe the JVM is not smart enough to expand the heap size because it is stupid and thinks there is no need because apparently there is enough free memory.

So my question is: does anyone have any experience of this JVM and might be able to conclusively confirm or disprove my assumptions about memory allocation? And also, does anyone have any ideas (assuming my assumptions are correct) about how to imrove the allocation of the arrays so that the memory won't become fragmented?

Note: things I've tried already:

I increased the initial array size of the StringBuffer and I increaed the read size of the parser so that it wouldn't need to create so many arrays.
I changed the expansion strategy of the StringBuffer so that once it reached a certain size threshold it would only expand by 25% rather than 100%.

Doing both of these things helped a little, but as I increase the size of the xml data going in I still get OutOfMemoryErrors at a fairly low size (approx. 350kb).

Another thing to add: all of this testing was performed on a device using the JVM in question. If I run the same code on the desktop using the Java SE 1.2 JVM I don't have any problems, or at least I don't get the problem until my data reaches about 4MB in size.

EDIT:

another thing I have just tried which has helped a bit is I set the Xms to 10M. So this gets past the problem of the JVM not expanding the heap when it should and allows me to process more data before the error occurs.

I think you have plenty of memory, but are creating a huge number of reference objects. Try this article : http://articles.techrepublic.com.com/5100-10878_11-1049545.html?tag=rbxccnbtr1 for more information.

Greg Smith 2010-01-14 19:17:41

Are you sure? That article talks about how to make objects *easier* to garbage-collect.

Dan Breslau 2010-01-14 19:26:20

I am not creating any reference objects?? As I said, I don't think I have a problem with objects not getting garbage collected becuase the JVM reports plenty of free memory. It is a question of where is the free memory? Is it fragmented? Is that why the JVM can't allocate my new array?

DaveJohnston 2010-01-14 19:48:10

I'm not sure if these StringBuffers are being allocated inside of MinML -- if so, I assume you have the source for it? If you do, then perhaps as you're scanning a string, if the string reaches a certain length (say, 10000 bytes), you could look ahead to determine the exact length of the string, and re-allocate a buffer to that size. This is ugly, but it would save memory. (It may even be faster than not doing the lookaheads, since you're potentially saving many re-allocations.)

If you don't have access to the MinML source, then I'm not sure what the lifetime of the StringBuffer is relative to the XML document. But this suggestion (though it's even uglier than the last one) might still work: Since you're getting the XML from disk, perhaps you could pre-parse it using (say) a SAX parser, solely to get the size of the string fields, and allocate the StingBuffers accordingly?

Dan Breslau 2010-01-14 19:33:41

The StringBuffers are allocated in the Handler objects for the SaxParser (which in this case is MinML). So the handler in question allocates a StringBuffer then every time the characters() method is called more data is appended to it.I am not scanning a string, it is all streamed from file, so I can't find out the size of the final string in advance unless I do as you said in your second suggestion and parse the file twice. But as you said that is ugly and time consuming.

DaveJohnston 2010-01-14 19:52:03

Ugly, yes. But it may be faster than you'd expect, especially if your current method requires a lot of re-allocations.

Dan Breslau 2010-01-14 20:08:52

Are you able to get a heap dump from the device?

If you get the heap dump and it is in a compatible format, some Java memory analysers give information on the size of contiguous memory blocks. I remeber seeing this functionality in the IBM Heap Analyzer http://www.alphaworks.ibm.com/tech/heapanalyzer , but also check the more up to date Eclipse Memory Analyzer http://www.eclipse.org/mat/

If you have the possiblity of modifying the XML file, that would probably the quickest way out. XML parsing in Java is always rather memory intensive and 300K is quite a lot for a single field. Instead you could try to separate this field into a separate non-xml file.

dparnas 2010-01-14 20:20:39

I very much doubt that I would be able to get a heap dump, the JVM is very limited in what you can do with it, or at least it isn't well documented, so I wouldn't know how to do it.Modifying the XML is a possibility that we are looking at as a last resort because the XML is a set of search results being returned by a server. Changing it would mean making changes to our server structure purely to work around what seems like a problem with the JVM. If that's what it comes to fine, but hopefully we will find a way to get the JVM to work properly.

DaveJohnston 2010-01-14 22:17:13

+1 A:

From what I know about JVMs, fragmentation should never be a problem you have to solve. If there's no more room for allocation - whether due to fragmentation or not - the garbage collector should run, and GCs also typically compress data to solve fragmentation issues.

To emphasize - you only get "out of memory" errors after the GC was ran and still not enough memory could be freed.

I would instead try to dig more into the options for the specific JVM you're running. For example, a "copying" garbage collector uses only half of the available memory at a time, so changing your VM to use something else could possibly free half of your memory.

I'm not really suggesting your VM uses simple copying GC, I'm just suggesting probing this on the VM level.

Oak 2010-01-14 22:16:20

Support for the JVM I am using is pretty much non-existent unfortunately (unless someone knows a good place to get support for Esmertec JBed CDC??). Any idea what the standard command line options are for changing GC options?

DaveJohnston 2010-01-15 11:10:23

@DaveJohnston: you can check the documentation for popular JVMs and hope yours behave the same; but there is no standard defined by the Java VM specifications (in fact, it explicitly says: "the memory layout of run-time data areas, the garbage-collection algorithm used [...] are left to the discretion of the implementor").

Oak 2010-01-15 12:09:10

+1 A:

Maybe you could try VTD light. It seems more memory efficient than SAX. (I know it's a huge change.)

superfav 2010-01-15 00:38:16

Just for the sake of updating my own question I have found that the best solution was to set the minimum heap size (I set it to 10M). This means that the JVM never has to decide whether or not to expand the heap and therefore it never (so far in test) dies with an OutOfMemoryError even though it should have plenty of space. So far in test we have been able to triple the amount of data we parse without an error and we could probably go further if we actually needed to.

This is a bit of a hack for a quick solution to keep existing customers happy, but we are now looking at a different JVM and I'll report back with an update if that JVM handles this scneario any better.

DaveJohnston 2010-01-18 18:05:07

ansaurus

tags:

views:

answers:

Avoid memory fragmentation when allocating lots of arrays in Java

related questions