I am trying to create an object tree from large number of xmls. However, when I run the following code on about 2000 xml files(ranging from 100KB to 200MB) (note that I have commented out the code that creates object tree), I get a large memory footprint of 8-9GB. I expect memory footprint to be minimum in the following example because the code doesn't doen't hold any references, it justs creates Elem and throws it away. The heap memory stays the same after running full GC.
def addDir(dir: File) {
dir.listFiles.filter(file => file.getName.endsWith("xml.gz")).foreach { gzipFile =>
addGzipFile(gzipFile)
}
}
def addGzipFile(gzipFile: File) {
val is = new BufferedInputStream(new GZIPInputStream(new FileInputStream(gzipFile)))
val xml = XML.load(is)
// parse xml and create object tree
is.close()
}
My JVM options are: -server -d64 -Xmx16G -Xss16M -XX:+DoEscapeAnalysis -XX:+UseCompressedOops
And the output of jmap -histo looks like this
num #instances #bytes class name ---------------------------------------------- 1: 67501390 1620033360 scala.collection.immutable.$colon$colon 2: 37249187 1254400536 [C 3: 37287806 1193209792 java.lang.String 4: 37200976 595215616 scala.xml.Text 5: 18600485 595215520 scala.xml.Elem 6: 3420921 82102104 scala.Tuple2 7: 213938 58213240 [I 8: 1140334 36490688 scala.collection.mutable.ListBuffer 9: 2280468 36487488 scala.runtime.ObjectRef 10: 1140213 36486816 scala.collection.Iterator$$anon$24 11: 1140210 36486720 scala.xml.parsing.FactoryAdapter$$anonfun$startElement$1 12: 1140210 27365040 scala.collection.immutable.Range$$anon$2 ... Total 213412869 5693850736