views:

140

answers:

1

(Note: I cannot change structure of the XML I receive. I am only able to change how I validate it.)

Let's say I can get XML like this:

<Address Field="Street" Value="123 Main"/>
<Address Field="StreetPartTwo" Value="Unit B"/>
<Address Field="State" Value="CO"/>
<Address Field="Zip" Value="80020"/>
<Address Field="SomeOtherCrazyValue" Value="Foo"/>

I need to create an XSD schema that validates that "Street", "State" and "Zip" must be present. But I don't care if either "StreetPartTwo" and/or "SomeOtherCrazyValue" happen to be present too.

If I knew that only the three I care about could be included (and that each would only be included once), I could do something like this:

<xs:element name="Address" type="addressType" maxOccurs="unbounded" minOccurs="3"/>

<xs:complexType name="addressType">
  <xs:attribute name="Field" use="required">
    <xs:simpleType>
      <xs:restriction base="xs:string">
        <xs:enumeration value="Street"/>
        <xs:enumeration value="State"/>
        <xs:enumeration value="Zip"/>
      </xs:restriction>
    </xs:simpleType>
  </xs:attribute>
</xs:complexType>

But this won't work with my case because I may also receive those other Address elements (that also have "Field" attributes) that I don't care about.

Any ideas how I can ensure the stuff I care about is present but let the other stuff in too?

TIA! Sean

+3  A: 

You cannot do the validation you seek, with just XML Schema.

According to the "XML Schema Part 1: Structures" specification ...

When two or more particles contained directly or indirectly in the {particles} of a model group have identically named element declarations as their {term}, the type definitions of those declarations must be the same.

It's not to say that you cannot build a schema that will validate a correct document. What it means is, you cannot build a schema that will fail to validate on some incorrect documents. And when I say "incorrect", I mean documents that violate the constraints you stated in English.

For example, suppose you have a document that includes three Street elements, like this:

<Address Field="Street" Value="123 Main"/> 
<Address Field="Street" Value="456 Main"/> 
<Address Field="Street" Value="789 Main"/> 
<Address Field="SomeOtherCrazyValue" Value="Foo"/> 

According to your schema, that document is a valid address. It's possible to add a xs:unique constraint to your schema so that it would reject such broken documents. But even with a xs:unique, validating against such a schema would declare that some other incorrect documents are valid - for example a document with three <Address> elements, each of which has a unique Field attribute, but none of which has Field="Zip".

In fact it is not possible to produce a W3C XML Schema that formally codifies your stated constraints. The <xs:all> element almost gets you threre, but it applies only to elements, not to attributes. And, it cannot be used with an extension, so you can't say, in W3C XML Schema, "all these elements in any order, plus any other ones".


In order to perform the validation you seek, your options are:

  1. rely on something other than XML Schema,
  2. perform validation in multiple steps, using XML Schema for the first step, and something else for the second step.

For the first option, I think you could use Relax NG to do it. The downside of that is, it's not a standard and as far as I can tell, it is neither widely supported nor growing. It would be like learning Gaelic in order to express a thought. There's nothing wrong with Gaelic, but it's sort of a linguistic cul-de-sac, and I think RelaxNG is, too.

For the second option, an approach would be to validate against your schema as the first step, and then, as the second step:

A. apply an XSL transform which would convert <Address> elements into elements named for the value of their Field attribute. The output of that transform would look like this:

<root>
  <Street Value="101 Bellavista Drive"/>
  <State  Value="Confusion"/>
  <Zip    Value="10101"/>
</root>

B. validate the output of that transform against a different schema, which looks something like this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified">
  <xs:element name="root">
    <xs:complexType>
      <xs:all>
        <xs:element maxOccurs="1" minOccurs="1" ref="Street" />
        <xs:element maxOccurs="1" minOccurs="1" ref="State" />
        <xs:element maxOccurs="1" minOccurs="1" ref="Zip" />
      </xs:all>
    </xs:complexType>
  </xs:element>

  <xs:element name="Street">
    <xs:complexType>
      <xs:attribute name="Value" use="required" type="xs:string"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="State">
    <xs:complexType>
      <xs:attribute name="Value" use="required" type="xs:string"/>
    </xs:complexType>
  </xs:element>
  <xs:element name="Zip">
    <xs:complexType>
      <xs:attribute name="Value" use="required" type="xs:string"/>
    </xs:complexType>
  </xs:element>

</xs:schema>

You would need to extend that schema to handle other elements like <SomeOtherCrazyValue> in the output of the transform. Or you could structure the xsl transform to just not emit elements that are not one of {State,Street,Zip}.

Just to be clear, I understand that you cannot change the XML that you receive. This approach wouldn't require that. It just uses a funky 2-step validation approach. Once the 2nd validation step completes, you could discard the result of the transform.


EDIT - Actually, Sean, thinking about this again, you could just use step B. Suppose your XSL transform just Removes from the document only <Address> elements that do not have State, Street or Zip for the Field attribute value. In other words, there would be no <Address Field="SomeOtherCrazyValue"...>. The result of that transform could be validated with your schema, using a maxOccurs="3", minOccurs="3", and an xs:unique.

Cheeso
Great response, Cheeso. Thank you!What you write is what I suspected. I'll have to think about your suggestion of using an xsl transformation (or RelaxNG/Gaelic ;^). My initial thought was that since I happen to be unmarshalling the xml to Java pojos, I could write a simple routine that inspects them at that point. But of course had it been possible, doing it all in a single xsd would have been preferable.Thanks again!Sean
scrotty
no prob - actually I think the RelaxNG is the "Gaelic" option. Doing the XSL Transformation is an alternative to learning Gaelic - just extra work in a language that you already know.
Cheeso
ps: I know someone is going to stomp on me for comparing RelaxNG to Gaelic.
Cheeso
Sláinte! :^) ....
scrotty
RE "Edit". That is brilliant! I'm going to go that route. Thank you again Cheeso. :)
scrotty