Which XML validation tools can you recommend for both performance and accuracy, each of which is a critical issue on our system? We have the following requirements:
- It is not not xmllint (see below)
- Supports RelaxNG
- Can easily integrate with Perl (this is optional, but it would be nice)
Why not xmllint? (This is background and you can skip it if you like)
We have a large Perl system which uses RelaxNG to validate our XML. We use the compact RelaxNG format and trang to convert it to the standard RelaxNG format. Then we do the actual validation via xmllint.
That's when the problems kick in. xmllint routinely has issues in reporting validation errors incorrectly. It doesn't give false positives or negatives, but if the document fails to validate, xmllint will often report the wrong element or attribute for a given error. Sometimes the error is correct ("did not expect to see element 'bar'), but only because a previous error was not reported (because 'bar' was supposed to be following the required but missing element 'foo', but xmllint doesn't tell us that bit). Note that this is a long-standing problem with xmllint and even the latest version has the same problems. We often have huge XML documents and misreporting the errors causes much grief for both clients and developers.