tags:

views:

315

answers:

1

I'm trying to develop a schema that will validate some existing XML files I've inherited. I'd like to have the schema do as much of the validation work as possible. The challenge is that attributes and elements are contingent on the values of other attributes.

The real data is pretty abstract, so I've created some simple examples. Let's say I have the following XML files:

<?xml version="1.0" encoding="UTF-8"?>
<Creature type="human" nationality="British">
    <Address>London</Address>
</Creature>

<?xml version="1.0" encoding="UTF-8"?>
<Creature type="animal" species="Tiger">
    <Habitat>Jungle</Habitat>
</Creature>

If the creature's "type" is "human", I'll have a "nationality" attribute and an "Address" child element. If the creature's "type" is "animal", I'll have a "species" attribute and a "Habitat" child element. For the purposes of this example, a "human" with a "species" or a "Habitat" would be invalid - as would an "animal" with a "nationality" or "Address".

If "Creature" wasn't the root element, I could probably have two different "Creature" choices below the root element, but I don't see how I can make this work when "Creature" is the root element.

Is there anyway of creating a schema for these files that would only match valid documents? If so, how would I go about it?

+5  A: 

You can use the xsi:type attribute for this purpose (you will have to use the xsi:type from the XMLSchema-instance namespace rather than your own namespace otherwise it won't work).

In the schema you declare a base type that is declared as abstract, and create additional complex types for each subtype (with the elements/attributes specific to that type).

Be aware that while this solution works, it would be better to use different element names for each type (the xsi:type is kind of going against the grain since it is now the type attribute in combination with the element name that defines the type rather than just the element name).

eg:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;

<xs:element name="Creature" type="CreatureType">
</xs:element>

  <xs:complexType name="CreatureType" abstract="true">
    <!-- any common validation goes here -->
  </xs:complexType>

  <xs:complexType name="Human">
    <xs:complexContent>
      <xs:extension base="CreatureType">
        <xs:sequence maxOccurs="1">
          <xs:element name="Address"/>
        </xs:sequence>
        <xs:attribute name="nationality" type="xs:string"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <xs:complexType name="Animal">
    <xs:complexContent>
      <xs:extension base="CreatureType">
        <xs:sequence maxOccurs="1">
          <xs:element name="Habitat"/>
        </xs:sequence>
        <xs:attribute name="species" type="xs:string"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

</xs:schema>

This schema will validate these two:

<?xml version="1.0" encoding="UTF-8"?>
<Creature xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
          xsi:type="Human" 
          nationality="British">
    <Address>London</Address>
</Creature>

<?xml version="1.0" encoding="UTF-8"?>
<Creature xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
          xsi:type="Animal" 
          species="Tiger">
    <Habitat>Jungle</Habitat>
</Creature>

but not this:

<?xml version="1.0" encoding="UTF-8"?>
<Creature xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
          xsi:type="SomeUnknownThing" 
          something="something">
    <Something>Something</Something>
</Creature>

or this:

<?xml version="1.0" encoding="UTF-8"?>
<Creature xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:type="Human" 
          species="Tiger">
    <Habitat>Jungle</Habitat>
</Creature>
Martin Wickett
Thank you for the very precise answer and excellent example in your solution. You just saved me hours of internet searching
jW
@Martin - Great example! Could you please explain why `xsi:type` works while `type` doesn't in the XML files?
Praetorian