views:

144

answers:

2

I plan to use dom4j DOM Document as a static cache in an application where multiples threads can query the document. Taking into the account that the document itself will never change, is it safe to query it from multiple threads?

I wrote the following code to test it, but I am not sure that it actually does prove that operation is safe?

    package test.concurrent_dom;

    import org.dom4j.Document;
    import org.dom4j.DocumentException;
    import org.dom4j.DocumentHelper;
    import org.dom4j.Element;
    import org.dom4j.Node;

    /**
     * Hello world!
     *
     */
    public class App extends Thread
    {
        private static final String xml = 
            "<Session>"
                + "<child1 attribute1=\"attribute1value\" attribute2=\"attribute2value\">"
                + "ChildText1</child1>"
                + "<child2 attribute1=\"attribute1value\" attribute2=\"attribute2value\">"
                + "ChildText2</child2>" 
                + "<child3 attribute1=\"attribute1value\" attribute2=\"attribute2value\">"
                + "ChildText3</child3>"
            + "</Session>";

        private static Document document;

        private static Element root;

        public static void main( String[] args ) throws DocumentException
        {
            document = DocumentHelper.parseText(xml);
            root = document.getRootElement();

            Thread t1 = new Thread(){
                public void run(){
                    while(true){

                        try {
                            sleep(3);
                        } catch (InterruptedException e) {                  
                            e.printStackTrace();
                        }

                        Node n1 = root.selectSingleNode("/Session/child1");                 
                        if(!n1.getText().equals("ChildText1")){                     
                            System.out.println("WRONG!");
                        }
                    }
                }
            };

            Thread t2 = new Thread(){
                public void run(){
                    while(true){

                        try {
                            sleep(3);
                        } catch (InterruptedException e) {                  
                            e.printStackTrace();
                        }

                        Node n1 = root.selectSingleNode("/Session/child2");                 
                        if(!n1.getText().equals("ChildText2")){                     
                            System.out.println("WRONG!");
                        }
                    }
                }
            };

            Thread t3 = new Thread(){
                public void run(){
                    while(true){

                        try {
                            sleep(3);
                        } catch (InterruptedException e) {                  
                            e.printStackTrace();
                        }

                        Node n1 = root.selectSingleNode("/Session/child3");                 
                        if(!n1.getText().equals("ChildText3")){                     
                            System.out.println("WRONG!");
                        }
                    }
                }
            };

            t1.start();
            t2.start();
            t3.start();
            System.out.println( "Hello World!" );
        }    

    }
A: 

I am actually not familiar with dom4j DOM but if you are not sure it can properly handle read-only data, I am not sure how good it is.

I will make the operational assumption that the executable part of your runnables (the part after the sleep) takes less than one microsecond and in your test run they happened consecutively, not concurrently. Thus your test does not really prove anything.

For a more robust test, I

  1. eliminated the 3 microsecond sleep - your test code should be busy generating potential conflicts, not sleeping.
  2. increased the thread count - the more concurrently executing threads, the more chance
  3. added primitive conflict detection

    final AtomicReference<Thread>owner=new AtomicReference<Thread>() ;
    class TestThread
    {
        private String url ;
        private String expected ;
        TestThread(int index) { url = "/Session/child" + i ; expected="ChildText" + i ; }
        public String toString () {return expected;}
        public void run()
        {
            while(true)
            {
                boolean own=owner.compareAndSet(null,this);
                Node n1 = root.selectSingleNode(url);                 
                boolean wrong = !n1.getText().equals(result);
                owner.compareAndSet(this,null);
                if(!own) { System.out.println ( owner.get() + " conflicts " + this ) }
                if(wrong){System.out.println(this + " WRONG!");
            }
        }
    }
    

    }

then

try{
    while(true){
    Thread t1 = new TestThread(1);
    t1.start();
    Thread t2 = new TestThread(2);
    t2.start();
    Thread t3 = new TestThread(3);
    t3.start();
    }
}
catch(Throwable thr){
    thr.printStackTrace();
}

If it works as predicted (this is uncompiled and untested) then it will keep generating new threads, the new threads will try to read the document. They will report if they potentially time conflict with another thread. They will report if they read a wrong value. It will keep generating new threads until your system runs out of resources, then it will crash.

emory
+1  A: 

http://xerces.apache.org/xerces2-j/faq-dom.html says

No. DOM does not require implementations to be thread safe. If you need to access the DOM from multiple threads, you are required to add the appropriate locks to your application code.

Without seeing the implementation, it's impossible to know if selectSingleNode uses any shared state for reading the DOM. I think it's safest to assume that it's not thread-safe.

An alternative is to use your own XPath processor, such as Jaxen, which is thread-safe.

XPath objects are fully reentrant and thread-safe. They contain no internal state for evaluation and thus can be cached easily and shared within an application. Once you have an XPath object, you can apply it against various initial contexts and retrieve results in several different ways: --- Introduction to SAX path and Jaxen

The JAXEN Jira has various fixes for thread-safe issues, providing evidence that Jaxen is designed to be thread-safe. This is one I came across by chance. And confirmation that Jaxen is thread-safe from one of the authors.

As well as being thread-safe, Jaxen is model-agnostic - it works with many models (W3C DOM, XOM, Dom4J, JDOM) and custom models can be plugged in by implementing a couple of interfaces.

I would imagine that simple accessors and iterators on the W3C DOM are thread safe. But this is just a hunch, and not a concrete fact. If you want to be 100% sure, then use a DOM that is designed for thread-saftey, for example, dom4j.

Some resources to get started: - An example of using Jaxen. - Jaxen FAQ and homepage

mdma