tags:

views:

40

answers:

2

In an open source project I maintain, we have at least three different ways of reading, processing and writing XML files and I would like to standardise on a single method for ease of maintenance and stability.

Currently all of the project files use XML from the configuration to the stored data, we're hoping to migrate to a simple database at some point in the future but will still need to read/write some form of XML files.

The data is stored in an XML format that we then use a XSLT engine (Saxon) to transform into the final HTML files.

We currently utilise these methods: - XMLEventReader/XMLOutputFactory (javax.xml.stream) - DocumentBuilderFactory (javax.xml.parsers) - JAXBContext (javax.xml.bind)

Are there any obvious pros and cons to each of these? Personally, I like the simplicity of DOM (Document Builder), but I'm willing to convert to one of the others if it makes sense in terms of performance or other factors.

Edited to add: There can be a significant number of files read/written when the project runs, between 100 & 10,000 individual files of around 5Kb each

A: 

This is a very subjective topic. It primarily depends on how you are going to use the xml and size of XML. If XML is (always) small enough to be loaded in to memory, then you don't have to worry about memory foot print. You can use DOM parser. If you need to a parse through 150 MB xml you may want to think of using SAX. etc.

Ck-
No, the XML files are numerous but not especially large. They currently are all read into memory.
Omertron
My guess is that performance difference should not be much in case of small xml files. Having said that "small" is how small would make a difference.
Ck-
+1  A: 

It depends on what you are doing with the data.

If you are simply performing XSLT transforms on XML files to produce HTML files then you may not need to touch a parser directly:

import java.io.File;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        TransformerFactory tf = TransformerFactory.newInstance();    
        StreamSource xsltTransform = new StreamSource(new File("xslt.xml"));
        Transformer transformer = tf.newTransformer(xsltTransform);

        StreamSource source = new StreamSource(new File("source.xml"));

        StreamResult result = new StreamResult(new File("result.html"));
        transformer.transform(source, result);            
    }

}

If you need to make changes to the input document before you transform it, DOM is a convenient mechanism for doing this:

import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;

public class Demo {

    public static void main(String[] args) throws Exception {
        TransformerFactory tf = TransformerFactory.newInstance();
        StreamSource xsltTransform = new StreamSource(new File("xslt.xml"));
        Transformer transformer = tf.newTransformer(xsltTransform);

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document document = db.parse(new File("source.xml"));
        // modify the document
        DOMSource source = new DOMSource(document);

        StreamResult result = new StreamResult(new File("result.html"));
        transformer.transform(source, result);  
    }

}

If you prefer a typed model to make changes to the data then JAXB is a perfect fit:

import java.io.File;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.util.JAXBSource;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        TransformerFactory tf = TransformerFactory.newInstance();
        StreamSource xsltTransform = new StreamSource(new File("xslt.xml"));
        Transformer transformer = tf.newTransformer(xsltTransform);

        JAXBContext jc = JAXBContext.newInstance("com.example.model");
        Unmarshaller unmarshaller = jc.createUnmarshaller();
        Model model = (Model) unmarshaller.unmarshal(new File("source.xml"));
        // modify the domain model
        JAXBSource source = new JAXBSource(jc, model);

        StreamResult result = new StreamResult(new File("result.html"));
        transformer.transform(source, result);            
    }

}
Blaise Doughan
We may need to do some work to the data from the XML before it gets written back out to the XML, including indexing and sorting.Currently we don't have any DTDs or XSDs for the XML data we write, but that would be fairly simplistic for me to write and maintain, however, I'd only look to do this if it gave clear benefit rather than just being another "documentation overhead".
Omertron
Why do you feel you need the XML schema? With JAXB you can start with POJOs and annotate them.
Blaise Doughan
Perhaps I misunderstood the "If you prefer a typed model" and "JAXBContext.newInstance("com.example.model");" parts of your post.Looks like I have more investigation to do.
Omertron
The typed model refers to representing your XML as domain objects. From your question you are already using JAXB for this. With JAXB you can generate classes from an XML schema, or start with Java classes and map them to XML using annotations. An example can be found here: http://wiki.eclipse.org/EclipseLink/Examples/MOXy/GettingStarted
Blaise Doughan