views:

1938

answers:

6

Hi,

I am running a Java Web Application in Tomcat. The application uses Quartz framework to schedule the cron job at regular intervals. This cron job involves parsing a 4+ MB xml file, which I am doing using JDOM API. The xml file contains around 3600 nodes to be parsed and consequently data to be updated in DB which I am doing it sequentially.
After parsing almost half of the file, my application throws a Out of Memory Exception. The stack trace of the same is :

Exception in thread "ContainerBackgroundProcessor[StandardEngine[Catalina]]" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3210)
        at java.lang.String.<init>(String.java:216)
        at java.lang.StringBuffer.toString(StringBuffer.java:585)
        at org.netbeans.lib.profiler.server.ProfilerRuntimeMemory.traceVMObjectAlloc(ProfilerRuntimeMemory.java:170)
        at java.lang.Throwable.getStackTraceElement(Native Method)
        at java.lang.Throwable.getOurStackTrace(Throwable.java:590)
        at java.lang.Throwable.getStackTrace(Throwable.java:582)
        at org.apache.juli.logging.DirectJDKLog.log(DirectJDKLog.java:155)
        at org.apache.juli.logging.DirectJDKLog.error(DirectJDKLog.java:135)
        at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1603)
        at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610)
        at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1590)
        at java.lang.Thread.run(Thread.java:619)
Exception in thread "*** JFluid Monitor thread ***" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2760)
        at java.util.Arrays.copyOf(Arrays.java:2734)
        at java.util.Vector.ensureCapacityHelper(Vector.java:226)
        at java.util.Vector.add(Vector.java:728)
        at org.netbeans.lib.profiler.server.Monitors$SurvGenAndThreadsMonitor.updateSurvGenData(Monitors.java:230)
        at org.netbeans.lib.profiler.server.Monitors$SurvGenAndThreadsMonitor.run(Monitors.java:169)
Nov 30, 2009 2:22:05 PM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor processChildren
SEVERE: Exception invoking periodic operation:
java.lang.OutOfMemoryError: Java heap space
        at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
        at java.lang.StringCoding.encode(StringCoding.java:272)
        at java.lang.String.getBytes(String.java:946)
        at java.io.UnixFileSystem.getLastModifiedTime(Native Method)
        at java.io.File.lastModified(File.java:826)
        at org.apache.catalina.startup.HostConfig.checkResources(HostConfig.java:1175)
        at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1269)
        at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:296)
        at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:118)
        at org.apache.catalina.core.ContainerBase.backgroundProcess(ContainerBase.java:1337)
        at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1601)
        at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1610)
        at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1590)
        at java.lang.Thread.run(Thread.java:619)
ERROR [JobRunShell]: Job updateVendorData.quoteUpdate threw an unhandled Exception:
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3210)
        at java.lang.String.<init>(String.java:216)
        at java.lang.StringBuffer.toString(StringBuffer.java:585)
        at org.apache.commons.dbcp.PoolingConnection$PStmtKey.hashCode(PoolingConnection.java:296)
        at java.util.HashMap.get(HashMap.java:300)
        at org.apache.commons.pool.impl.GenericKeyedObjectPool.decrementActiveCount(GenericKeyedObjectPool.java:1085)
        at org.apache.commons.pool.impl.GenericKeyedObjectPool.returnObject(GenericKeyedObjectPool.java:882)
        at org.apache.commons.dbcp.PoolablePreparedStatement.close(PoolablePreparedStatement.java:80)
        at org.apache.commons.dbcp.DelegatingStatement.close(DelegatingStatement.java:168)
        at com.netcore.smsapps.stock.db.CompanyDaoImpl.updateCompanyQuote(CompanyDaoImpl.java:173)
        at com.netcore.smsapps.stock.vendor.MyirisVendor.readScripQuotes(MyirisVendor.java:159)
        at com.netcore.smsapps.stock.update.StockUpdateData.execute(StockUpdateData.java:38)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:207)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)
DEBUG [ExceptionHelper]: Detected JDK support for nested exceptions.
ERROR [ErrorLogger]: Job (updateVendorData.quoteUpdate threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.lang.OutOfMemoryError: Java heap space]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3210)
        at java.lang.String.<init>(String.java:216)
        at java.lang.StringBuffer.toString(StringBuffer.java:585)
        at org.apache.commons.dbcp.PoolingConnection$PStmtKey.hashCode(PoolingConnection.java:296)
        at java.util.HashMap.get(HashMap.java:300)
        at org.apache.commons.pool.impl.GenericKeyedObjectPool.decrementActiveCount(GenericKeyedObjectPool.java:1085)
        at org.apache.commons.pool.impl.GenericKeyedObjectPool.returnObject(GenericKeyedObjectPool.java:882)
        at org.apache.commons.dbcp.PoolablePreparedStatement.close(PoolablePreparedStatement.java:80)
        at org.apache.commons.dbcp.DelegatingStatement.close(DelegatingStatement.java:168)
        at com.netcore.smsapps.stock.db.CompanyDaoImpl.updateCompanyQuote(CompanyDaoImpl.java:173)
        at com.netcore.smsapps.stock.vendor.MyirisVendor.readScripQuotes(MyirisVendor.java:159)
        at com.netcore.smsapps.stock.update.StockUpdateData.execute(StockUpdateData.java:38)
        at org.quartz.core.JobRunShell.run(JobRunShell.java:207)

This causes even my tomcat to crash. Can you please help me in diagnosing the problem. I even have enabled profiling in the Netbeans for the same but it seems that even that crashed. I have kept the default memory allocated to Tomcat. Is there any memory leak taking place. My DB is postgres and JDK is 1.6.0_15.

Thanks, Amit

+2  A: 

Everytime you use a DOM to parse a XML file, you'll load entire file into memory and DOM infrastructure will use about same size to handle it, so it'll consume about twice memory than your file size.

You'll need to use SAX, an event based parser. While this can be hard to understand it first time, it's a very memory effective, as it just keeps in memory current parsing node.

Seems Java have some SAX implementations, like StAX, I hope it helps.

Rubens Farias
Hi Rubens, I am using JDOM for parsing the large XML and it uses SAX parser internally. My parsing code is : SAXBuilder builder = new SAXBuilder();Document doc = builder.build(inputResource);Element elem = doc.getRootElement();
Amit
since your DOM parser uses SAX, you should read your XML sequentially, and to avoid to use `..`, `//` and stuff
Rubens Farias
A: 

Are you sure there are no recursive array copy somewhere, left there by mistake? Perhaps in different threads?

lorenzog
Hi Lorenzog, I am not using any thread, for this purpose and no array copy. I am using JDOM for parsing the XML file and I guess it uses ArrayList for its implementation. Can that be an issue? Is there any possibility of memory leak?
Amit
A: 

I'll second that point about the file and the DOM taking up a great deal of memory. I also wonder when I see this:

ERROR [JobRunShell]: Job updateVendorData.quoteUpdate threw an unhandled Exception:  
    java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3210)

What's that copying doing? I wonder if there's something else bad going on in your code.

If you've gotten this far, it suggests that you've read the file and the DOM successfully and you're starting to write to the database. The file memory should already be reclaimed.

I'd suggest looking at memory using VisualGC so you can see what's going on.

duffymo
That copying is the internals of StringBuffer.
BalusC
Hi duffy, thanks for your reply. Can JDOM internally implement this copying ? Always I have enabled profiling for the Application and there are 2 classes which exist even after GC has run, they are org.postgresql.jdbc4.Jdbc4PrepareStatement and org.postgresql.jdbc4.Jdbc4ResultSet. Can these be cause of memory leak in the Application ?
Amit
Could be. Can't tell without code, but if you aren't closing these properly you could be leaking resources that will bring you grief.
duffymo
With the best of my abilities, I have taken care that I close all the connections which I have opened but the problem still exists. Is there any way you can help me out with this
Amit
Not without more info.
duffymo
Tell me how can i can provide you more info..
Amit
+1  A: 

Parsing XML is an fairly expensive task. The average DOM parser would already need at least five times of the memory space as the XML document big is. You should take this fact into account as well. To ensure that there is no memory leak somewhere else which caused the memory shortage for the XML parser, you really need to run a profiler. Give it all more memory, double the available memory and profile it. When you've nailed the cause down and fixed the leak, then you can just fall back to the "default" memory and retest. Or if there is really no means of any leak, then just give it all a bit more memory than default so that it all suits.

You can also consider to use a more memory efficient XML parser instead, for example VTD-XML (homepage here, benchmarks here).

BalusC
A: 

Have you tried setting the max heap size bigger to see if the problem still occurs then? There may not even be a leak at all. It might just be that the default heap size (64m on Windows I think) is insufficient for this particular process.

I find that I almost always need to give any application I'm running Tomcat more heap and perm gen space than the defaults or I'll run into out of memory problems. If you need help adjusting the memory settings take a look at this question.

Jason Gritman
Hi Jason, thanks for your reply. I increased my heap size and the application worked just fine. I have set the profiling for my application and after the process was completed, could find 2 kinds of live allocated objects not being deallocated by garbage collector, they are org.postgresql.jdbc4.Jdbc4PrepareStatement and org.postgresql.jdbc4.Jdbc4ResultSet. Can these be cause of memory leak in the Application ?
Amit
That might be an indication you aren't closing you're JDBC objects correctly. Are you doing the JDBC calls yourself or are you using a framework (like Spring) to wrap your JDBC calls? If you are calling JDBC directly, make sure you call the close() method on any ResultSet, Statement, PreparedStatement, and Connection objects in a finally block when you are finished using them.
Jason Gritman
With the best of my abilities, I have taken care that I close all the connections which I have opened but the problem still exists. Is there any way you can help me out with this
Amit
A: 

You could run your application with: -XX:+HeapDumpOnOutOfMemoryError. This will cause the JVM to produce a heap dump when it runs out of memory. You can the use something like: MAT or JHAT to see what objects are being held on to. I suggest using the eclipse memory analyzer tool (MAT) on the generated heap dump as it is fairly straightforward to use: http://www.eclipse.org/mat/

Of course you will need to have some idea as to what objects may be hanging around in order for this to be useful. DOM objects? Resources from previous loads of xml documents? Database connections? MAT will allow you to trace the references back to a root object from some object that you suspect should have been garbage collected.

mlaw