views:

1174

answers:

6

I've always found validation against a schema to be an invaluable ward against thinkos and would like to incorporate validation checks as part of a project where I frequently need to hand-write XML files a few hundred lines in length. My text editor has a fairly nice CLI integration feature, so I'm looking for a command-line validator.

When I didn't find any clear winners via Google, I poked around here and found a similar question, but none of the tools suggested there quite fit my needs:

  • libxml (via cygwin) — does not report line numbers; I have no idea where my errors are!
  • msxml — cannot be run from the command line?
  • xerces-c — seems to require a copy of Visual C?
  • xerces2-j — cannot be run from the command line?
  • xmlstarlet — insufficient XSD support*

(*The schema I'm validating against uses substitution groups — inappropriately, but it's external to the project, so I can't change it — which causes xmlstarlet to choke even on valid files.)

Normally, this is the point in solving a problem at which I'd give up on looking for an existing solution and reach for the Python-hammer, but Python's XML support is notoriously… well… actually, let's just leave it at "notorious".

So I'm back to looking for a pre-existing tool. My requirements are pretty simple:

  • runs on Win32 (Windows XP SP3, specifically)
  • command-line; my editor can work with just about any combination of stdin/-out/-err, arguments, temp files, etc.
  • reasonably complete XSD support (particularly namespaces and substitution groups)
  • reports the line number where the error occurred!

Does such a tool exist? I'd prefer not to have to install Visual Studio and friends (too bloated, IMO), but I do already have both Cygwin and Python installed.

+1  A: 

You might try one of the Visual Studio 2008 Express editions. There's much better XML support now, including validation, of course, but also XML Intellisense, XML snippets, and an XML Schema view.

John Saunders
I doubt that devenv can validate XML files on the commandline.
Joey
I didn't suggest it could. I'm suggesting the UI experience may be sufficiently good to change how you work with XML files.
John Saunders
A: 

I would suggest Windows Powershell with PowerShell Community eXtensions. PSCX has the Test-Xml cmdlet which has the following Get-Help detailed description:

Tests for well formedness and optionally validates against XML Schema. It doesn't handle specifying the targetName space. To see validation error messages, specify the -Verbose flag.

I do not know if it reports the errors with linenumbers but 3 out 4 isn't bad.

Bas Bossink
+4  A: 

Your first option, xmllint (libxml2), does give line numbers for errors in the xml (and also in the xsd). You probably just need a later version. I just confirmed both using my copy, which is:

>  xmllint --version
xmllint: using libxml version 20627

Example output:

invalidXml.xml:4: element c: Schemas validity error : Element 'c': This element is not expected. Expected is ( b ).
invalidXml.xml fails to validate
<?xml version="1.0"?>
<invalidXmlEg>
  <a/>
<!--  <b></b> -->
  <c/>
</invalidXmlEg>

Where the xsd is:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;
  <xs:element name="invalidXmlEg">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="a" type="xs:string" />
        <xs:element name="b" type="xs:string" />
        <xs:element name="c" type="xs:string" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

NOTE: I have noticed that xmllint will accept elements names that it shouldn't (e.g. "<invalidXml.xsd>"), but this doesn't seem to affect your task.

EDIT adding the "compiled with" part of the version:

 compiled with: Threads Tree Output Push Reader
 Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy
 C14N Catalog XPath XPointer XInclude Iconv ISO8859X
 Unicode Regexps Automata Expr Schemas Schematron
 Modules Debug Zlib
13ren
Interesting! The version I'm using is 20703, which produces "`Element 'c': This element is not expected. Expected is ( b ).`" (nearly identical, but lacks line number). I'll have to see if I can dig up and older version.
Ben Blank
Seems a backward step... I wonder if they added an option, for whether line numbers are included or not? Might be worth checking the docs (maybe --verbose). Or... maybe it's to do with what has been compiled in? I didn't include that, but I'll add it (also I'm running on linux, which shouldn't make any difference): it's compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib
13ren
Looks like the version of `libxml` in Cygwin is compiled with the same options, and there doesn't seem to be any `--verbose`-like option. I ended up grabbing the Win32 binaries for 2.6.27 from the official site and it works just fine. A backwards step, indeed. :-)
Ben Blank
@Ben I'm glad to hear you've got version that works now!
13ren
A: 

Xerces-J comes with a sample application, jaxp.SourceValidator. You can feed it your XML file and it will validate it.

As for Xerces-C, I haven't used it myself, but I know it does not require all of the Visual C++, all it needs is runtime files. These can be downloaded separately from Microsoft. There seems to be a sample application which does what you need - see StdInParse

Juris
+2  A: 

As 13ren stated above, libxml's xmllint does report line numbers - perhaps you have a version issue. You might find it useful to grab native (non cygwin) versions of the libxml/libxslt tools from http://www.zlatkovic.com/libxml.en.html

You might also want to take a look at msv from Sun. It isn't a full implementation of XSD but might do the job (I use it for RelaxNG validation generally)

Nic Gibson
+1  A: 

You might want to take a look at XML ValidatorBuddy from http://www.xml-tools.com which comes with the Xerces parser alrady installed and has also a command-line tool.

xml-tools.com