tags:

views:

833

answers:

3

Hi, I have the following (errorous) Xml:

<jobs>
    <job>
        <id>1</id>
        <state><![CDATA[IL]]></state>
    </job>
    <job>
        <id>2</id>
    </job>
</jobs>

both the id and the state node are reqired items. I wrote an Xsd for it:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="importvalidator"
    elementFormDefault="qualified"
    targetNamespace="http://foo.org/importvalidator.xsd"
    xmlns="http://foo.org/importvalidator.xsd"
    xmlns:mstns="http://foo.org/importvalidator.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;
    <xs:element name="jobs">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="job" minOccurs="1" maxOccurs="unbounded">
              <xs:complexType>
                <xs:all>
                  <xs:element name="id" type="xs:string" minOccurs="1"/>
                  <xs:element name="state" type="xs:string" minOccurs="1"/>
                </xs:all>
              </xs:complexType>
            </xs:element>
          </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

And it still validates as a structurally valid Xml. What am I missing here?

Update1: the code I'm using is in C#:

        XmlSchemaSet schemas = new XmlSchemaSet();
        schemas.Add("http://foo.org/importvalidator.xsd", "validator.xsd");

        XDocument doc = XDocument.Load(fileName);
        if (doc == null | doc.Root == null)
        {
            throw new ApplicationException("xml error: the referenced stream is not xml.");
        }

        doc.Validate(schemas, (o, e) =>
        {
            throw new ApplicationException("xsd validation error: xml file has structural problems");
        });
A: 

Which parser/language are you using? It used to be be that you'd have to tell the Xerces parser that you'd like XSD validation if you're using Java. But after checking the latest release docs, I see that validation is built-in now. So it seems that the version is important. If you're a .NET developer, best to check what its settings need to be.

Just curious - why the CDATA surrounding the state in your example? There's no need for that, AFAIK. You can even embed a restriction in your XSD to ensure that you only get valid US state codes.

But first things first - get the schema validating.

duffymo
@duffymo: thanks for your comment, 1) it's a .net project in C#.2) I know, but I can't modify the structure of the Xml, BTW the state-code validation is not in the scope of the project (yet).
balint
You're most welcome, balint. I'm sorry that I'm not more helpful. I wouldn't consider removing an unnecessary CDATA as modifying structure, but perhaps I don't know all your requirements. It's good to be thinking about the state validation if it becomes important, but perhaps that day isn't today.
duffymo
+1  A: 

Please format your xml so it's easier to read - like this:

<jobs>
  <job>
    <id>1</id>
    <state><![CDATA[IL]]></state>
  </job>
  <job>
    <id>2</id>
  </job>
</jobs>

I think you're not actually validating it - the namespaces mean that that XML does not validate, even with a "<state>" in the second "<job>". Specifically, the XSD has a target namespace of "http://foo.org/importvalidator.xsd", but the XML has no namespace given.

Set up a trivial test case of XSD and XML, that you definitely know will fail - use that to track down why you aren't validating.

Also, your XSD is missing the close tags for element and schema, so it should give an error - or it's just a mis-paste :-)


You can remove the targetNamespace from the schema:

<xs:schema id="importvalidator"
    elementFormDefault="qualified"
    targetNamespace="http://foo.org/importvalidator.xsd    ← DELETE THIS"
    xmlns="http://foo.org/importvalidator.xsd"
    xmlns:mstns="http://foo.org/importvalidator.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;

So it looks like this:

<xs:schema id="importvalidator"
    elementFormDefault="qualified"
    xmlns="http://foo.org/importvalidator.xsd"
    xmlns:mstns="http://foo.org/importvalidator.xsd"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;

PS: anyone know if/how you can highlight parts of source code with SO's markdown?

13ren
@13ren: I fixed his formatting, so the close tags are visible. You have the right answer: it's not validating because of the namespace.
John Saunders
@13ren: with a Hungarian keyboard layout + Google Chrome, I'm happy if I just can insert any code blocks...:/
balint
@13ren, @John: OK, but how can I do a validation without any namespace declaration?
balint
@13ren: without the namespace declaration, the XML is _not_ valid.
John Saunders
@balint It's just that it makes it easier to help you. I'm adding another solution for you.
13ren
@john thanks for the edit
13ren
ah, so the targetnamespace is not a mandatory param. thanks :)
balint
+1  A: 

@13ren has the correct answer. It is not an error if a node does not match any schema. It's only a warning. I can see the warnings in the code below:

private static void ValidateDocument(XmlSchemaSet schemas, string uri)
{
    var settings = new XmlReaderSettings
                       {
                           Schemas = schemas,
                           ValidationFlags =
                               XmlSchemaValidationFlags.
                                   ProcessIdentityConstraints |
                               XmlSchemaValidationFlags.
                                   ReportValidationWarnings,
                           ValidationType = ValidationType.Schema
                       };
    settings.ValidationEventHandler += OnValidationEventHandler;
    using (var validatingReader = XmlReader.Create(uri, settings))
    {
        XDocument.Load(
            validatingReader,
            LoadOptions.SetBaseUri | LoadOptions.SetLineInfo);
    }
    return;
}

This produces the following:

Warning: Could not find schema information for the element 'jobs'. Warning: Could not find schema information for the element 'job'. Warning: Could not find schema information for the element 'id'. Warning: Could not find schema information for the element 'state'. Warning: Could not find schema information for the element 'job'. Warning: Could not find schema information for the element 'id'.

Changing your XML and running again:

<?xml version="1.0" encoding="utf-8" ?>
<jobs xmlns="http://foo.org/importvalidator.xsd"&gt;
  <job>
    <id>1</id>
    <state><![CDATA[IL]]></state>
  </job>
  <job>
    <id>2</id>
  </job>
</jobs>

produces the error you expected:

Error: The element 'job' in namespace 'http://foo.org/importvalidator.xsd' has incomplete content. List of possible elements expected: 'state' in namespace 'http://foo.org/importvalidator.xsd'.

John Saunders
John, thanks for your answer! I was looking for a good example for validation with XDocument, this was all I can came up with :)I can't validate with Xsd without setting a namespace in the Xml file? (It's a huge xml in size, and I'm affraid if I modify the structure, it'll have an impact on the load-speed).
balint
I think you're missing my point. If your XML is valid with no namespace in it, then your schema is wrong. If your schema is right, then your XML is not valid without a namespace.
John Saunders