At work I am parsing large XML files using the DefaultHandler
class. Doing that, I noticed that this interface allocates many String
s, for element names, attribute names and values, and so on.
From that, I thought about creating an XML parser that only does the absolute minimum of object allocation. Currently I need:
- one StringBuilder for building the element names, attribute names, etc.
- one CharsetDecoder for transforming bytes into chars.
My test program, for parsing http://magnatune.com/info/song_info.xml, looks like this:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
public class XmlParserDemo {
public static void main(String[] args) throws IOException {
List<Map<String, String>> allSongs = new ArrayList<Map<String, String>>();
InputStream fis = new FileInputStream("d:/song_info.xml");
try {
XmlParser parser = new XmlParser(new BufferedInputStream(fis));
if (parser.element("AllSongs")) {
while (parser.element("Track")) {
Map<String, String> track = new LinkedHashMap<String, String>();
while (parser.element()) {
String name = parser.getElementName();
String value = parser.text();
track.put(name, value);
parser.endElement();
}
allSongs.add(track);
parser.endElement();
}
parser.endElement();
}
} finally {
fis.close();
}
}
}
This code looks better than my experiments with the XMLEventReader
. Now the only missing part would be the XmlParser
class mentioned in the code above. Do you know if someone has written that code before? It's really just a pet project of mine, but I'm curious how much the old statement Object creation is expensive is worth anymore.
Yes, I know that LinkedHashMap
s are using much memory. It's really just the parsing part that I want to be memory-efficient. Everything else is just for making a simple example.