




I am parsing a string in Java using javax.xml.parsers.DocumentBuilder. However, there is not a function to parse a String directly, so I am instead doing this:

static public Document parseText(String zText)
  DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
  DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
  Document doc = dBuilder.parse(new InputSource(new StringReader(zText)));
  return doc;
 catch (Exception e)

 return null;

Is this the best way to do it? I feel like there must be a simpler way... thanks!

+1  A: 

I personally prefer dom4j. Check out their quick start, it is pretty simple.

+4  A: 

To answer your question directly - to my knowledge, there is not a better way. The input source is used because it is more universal and can handle input from a file, a String or across the wire is my understanding.

You could also try using the SAX Xml parser - it is a little more basic, and uses the Visitor Pattern, but it gets the job done and for smallish data sets and simple XML schemas it is pretty easy to use. SAX is also included with the core JRE.

+1  A: 

I wouldn't normalize if I am in a hurry or if I do not care. You could normalize just the nodes when you need.

+1  A: 

I agree with aperkins and here is my javax helper:

 * Returns a {@code Document} from the specified XML {@code String}.
 * @param xmlDocumentString a well-formed XML {@code String}
 * @return a {@code org.w3c.dom.Document}
public static Document getDomDocument(String xmlDocumentString)
    if(StringUtility.isNullOrEmpty(xmlDocumentString)) return null;

    InputStream s = null;

        s = new ByteArrayInputStream(xmlDocumentString.getBytes("UTF-8"));
    catch(UnsupportedEncodingException e)
        throw new RuntimeException("UnsupportedEncodingException: " + e.getMessage());

    return XmlDomUtility.getDomDocument(s);

This helper depends on another one:

 * Returns a {@code Document} from the specified {@code InputStream}.
 * @param input the {@code java.io.InputStream}
 * @return a {@code org.w3c.dom.Document}
public static Document getDomDocument(InputStream input)
    Document document = null;
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        document = builder.parse(input);
    catch(ParserConfigurationException e)
        throw new RuntimeException("ParserConfigurationException: " + e.getMessage());
    catch(SAXException e)
        throw new RuntimeException("SAXException: " + e.getMessage());
    catch(IOException e)
        throw new RuntimeException("IOException: " + e.getMessage());

    return document;

Update: these are my imports:

import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;
Rasx: Where are StringUtility and XmlDomUtility imported from?
Jim Ferrans
I am using the standard JavaSE javax libraries:import java.io.ByteArrayInputStream;import java.io.File;import java.io.IOException;import java.io.InputStream;import java.io.UnsupportedEncodingException;import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import javax.xml.parsers.ParserConfigurationException;import org.w3c.dom.Document;import org.w3c.dom.Node;import org.xml.sax.SAXException;