At work I am parsing large XML files using the DefaultHandler class. Doing that, I noticed that this interface allocates many Strings, for element names, attribute names and values, and so on.
From that, I thought about creating an XML parser that only does the absolute minimum of object allocation. Currently I need:
- one StringBuilder for building the element names, attribute names, etc.
- one CharsetDecoder for transforming bytes into chars.
My test program, for parsing http://magnatune.com/info/song_info.xml, looks like this:
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
public class XmlParserDemo {
public static void main(String[] args) throws IOException {
List<Map<String, String>> allSongs = new ArrayList<Map<String, String>>();
InputStream fis = new FileInputStream("d:/song_info.xml");
try {
XmlParser parser = new XmlParser(new BufferedInputStream(fis));
if (parser.element("AllSongs")) {
while (parser.element("Track")) {
Map<String, String> track = new LinkedHashMap<String, String>();
while (parser.element()) {
String name = parser.getElementName();
String value = parser.text();
track.put(name, value);
parser.endElement();
}
allSongs.add(track);
parser.endElement();
}
parser.endElement();
}
} finally {
fis.close();
}
}
}
This code looks better than my experiments with the XMLEventReader. Now the only missing part would be the XmlParser class mentioned in the code above. Do you know if someone has written that code before? It's really just a pet project of mine, but I'm curious how much the old statement Object creation is expensive is worth anymore.
Yes, I know that LinkedHashMaps are using much memory. It's really just the parsing part that I want to be memory-efficient. Everything else is just for making a simple example.