views:

153

answers:

1

I want to test (true or false) whether an arbitrary XML file matches a given schema.

For what it's worth, the schema is the Word 2003 WordML schema, which Microsoft defines using a list of about 7 *.xsd files.

One of these files also includes the W3C xml.xsd file, by including the following statement:

<xsd:import id="xml" namespace="http://www.w3.org/XML/1998/namespace"
    schemaLocation="http://www.w3.org/2001/xml.xsd"&gt;&lt;/xsd:import&gt;

I am using .NET code like the following to do the validation:

   public static void validate(string filename)
    {
       XmlReaderSettings settings = new XmlReaderSettings();
       settings.Schemas.Add(
           "http://schemas.microsoft.com/office/word/2003/wordml",
           //to get this file I downloaded "Office 2003: XML Reference Schemas", i.e. "Office2003XMLSchema.exe" 
           @"C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office 2003 XML Reference Schemas\WordprocessingML Schemas\wordnet.xsd"
           );
        settings.ValidationType = ValidationType.Schema;
        settings.ValidationEventHandler += new ValidationEventHandler(validationEventHandler);
        XmlReader xmlReader = XmlReader.Create(filename, settings);
        while (xmlReader.Read()) { }
   }

My problem is that if I run this code on a machine which is not connected to the internet, then I get a XmlSchemaValidationException error to the effect that it can't find xml.xsd.

To fix this, I downloaded a copy of xml.xsd, and add it explicitly using the settings.Schemas.Add method: the validation now works correctly when the machine is not connected to the internet.

However when the machine is connected to the internet, I now get an error saying that The global attribute 'http://www.w3.org/XML/1998/namespace:lang' has already been declared..

So apparently I either need to add it explicitly, or I don't, depending on whether the machine is able to silently download it from the internet (or even perhaps has previously been able to download it, and has it cached somewhere).

So, it's "damned if I do and damned if I don't". Do I need to try it one way, catch the exception and then try it the other way? Or is there a more elegant solution?

+2  A: 

We can't see your code, but In many implementations this is handled by redirecting the request for the .xsd to the local copy using a catalog resolver. There is a property XmlReaderSettings.XmlResolver that can be used for this. See XMLCatalog.net for an Apache-licensed implementation you can use.

A side-effect of this is that you can keep all schemas cached locally. This is especially important since W3C will block excessive reads to their site and randomly your code (or worse, your customer's code) will begin to fail.

lavinio
Thank you for the suggestion; I'll experiment to see whether I can fix it by using a subclass `System.Xml.XmlResolver`.
ChrisW
I've got it working now. I was failing before, because I was assigning to the `XmlReaderSettings.XmlResolver` property; but when/because I am using `settings.Schemas.Add`, I therefore needed instead to assign to the `settings.Schemas.XmlResolver` property.
ChrisW