views:

2145

answers:

5

I'm trying to find a way to validate a large XML file against an XSD. I saw the question ...best way to validate an XML... but the answers all pointed to using the Xerces library for validation. The only problem is, when I use that library to validate a 180 MB file then I get an OutOfMemoryException.

Are there any other tools,libraries, strategies for validating a larger than normal XML file?

EDIT: The SAX solution worked for java validation, but the other two suggestions for the libxml tool were very helpful as well for validation outside of java.

+21  A: 

Instead of using a DOMParser, use a SAXParser. This reads from an input stream or reader so you can keep the XML on disk instead of loading it all into memory.

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource(new FileReader ("document.xml")));
jodonnell
+6  A: 

Use libxml, which performs validation and has a streaming mode.

John Millikin
+1  A: 

Personally I like to use XMLStarlet which has a command line interface, and works on streams. It is a set of tools built on Libxml2.

dlamblin
A: 

SAX and libXML will help, as already mentioned. You could also try increasing the maximum heap size for the JVM using the -Xmx option. E.g. to set the maximum heap size to 512MB: java -Xmx512m com.foo.MyClass

GaZ
+1  A: 

XML ValidatorBuddy from http://www.xml-tools has an own command to validate huge XML files (multiple GB). It uses the Xerces-C SAX parser for this purpose.

The tool also allows to specify a certain XSD for validation so you don't need to edit the large XML file (to add the schema reference).

xml-tools.com