views:

79

answers:

2

Hello,

I'm having hard time to decide between the following two implementations. I want to cache the javax.xml.parsers.DocumentBuilder object, per thread. My main concern is runtime performance - Hench I would be happy to avoid as much GC as possible. Memory is not an issue.

I've written two POC implementations, and would be happy to hear from the community PROS/CONS regarding each one.

Thanks for the help guys.

Option #1 - WeakHashMap

import java.io.IOException;
import java.io.StringReader;
import java.util.WeakHashMap;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;


public class DocumentBuilder_WeakHashMap {
    private static final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    private static final WeakHashMap<Thread, DocumentBuilder> CACHE = new WeakHashMap<Thread, DocumentBuilder>();

    public static Document documentFromXMLString(String xml) throws SAXException, IOException, ParserConfigurationException {
        DocumentBuilder builder = CACHE.get(Thread.currentThread());
        if(builder == null) {
            builder = factory.newDocumentBuilder();
            CACHE.put(Thread.currentThread(), builder);
        }

        return builder.parse(new InputSource(new StringReader(xml)));
    }

}

Option #2 - ThreadLocal

import java.io.IOException;
import java.io.StringReader;
import java.lang.ref.WeakReference;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;


public class DocumentBuilder_ThreadLocal {
    private static final DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    private static final ThreadLocal<WeakReference<DocumentBuilder>> CACHE = 
        new ThreadLocal<WeakReference<DocumentBuilder>>() {
            @Override 
            protected WeakReference<DocumentBuilder> initialValue() {
                try {
                    return new WeakReference<DocumentBuilder>(factory.newDocumentBuilder());
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }
        };

    public static Document documentFromXMLString(String xml) throws ParserConfigurationException, SAXException, IOException {
        WeakReference<DocumentBuilder> builderWeakReference = CACHE.get();
        DocumentBuilder builder = builderWeakReference.get();

        if(builder == null) {
            builder = factory.newDocumentBuilder();
            CACHE.set(new WeakReference<DocumentBuilder>(builder));
        }

        return builder.parse(new InputSource(new StringReader(xml)));
    }
}

They both do the same thing (expose documentFromXMLString() to the outside world) so which one would you use?

Thank you, Maxim.

+3  A: 

Hello,

The ThreadLocal solution is better as long as you don't use the weakreference but rather use directly a ThreadLocal<DocumentBuilder>. Access to the ThreadLocal value is faster because the thread directly references an array containing all ThreadLocal values and it has just to compute the index in this array to do the lookup. Look at the ThreadLocal source to see why the index computation is fast (int index = hash & values.mask;)

pgras
The real performance hit is the necessary synchronization of the Collection-based-solution, not the 2-cpu-cycles faster/slower implementation.
Hardcoded
@Hardcoded: true and the interesting question is that knowing one entry will never be accessed by more than one thread, how should one synchronize access to that map ???
pgras
Just to emphasize, don't use `WeakReference`. Not because of the performance overhead of using it, but it can be cleared immediately. Some programs, such as NetBeans, can absolutely crawl once HotSpot optimises. `SoftReference` would be okay.
Tom Hawtin - tackline
@pgras The Map itself must be synchronized, since it can get corrupted. `Collections.synchronizedMap(new WeakHashMap<Thread, DocumentBuilder>());` should do the trick for this problem. The backing array of the ThreadLocal is only accessed within the same thread, so no synchronization is needed here.
Hardcoded
@Hardcoded another option would be to use guava's `new MapMaker().weakKeys().makeMap()`, which gives a weakly-keyed concurrent map. With `.makeComputingMap()` it can also, in a threadsafe fashion, produce new cachable values on-demand.
Cowan
@Cowan There's also a synchronization overhead, even if it is less than with synchronizedMap. So the best solution is still ThreadLocal.
Hardcoded
+2  A: 

The WeakHashMap alone will fail, because it is not thread safe:
"Like most collection classes, this class is not synchronized."
(3rd paragraph at the JavaDoc)

Since sychronization will take time and Collections.synchronizedMap won't scale very well, you should stick with ThreadLocal.

Hardcoded