tags:

views:

73

answers:

3

I'm wary that this may generate discussion rather than answers, but nevertheless ...

I currently have an XML that I designed with the primary goal of making it concise and human-readable ... which for me meant favouring attributes over elements and minimizing the vocabulary:

<?xml version='1.0'?>

<Calculation jobId='XI5-332123' user='wombat' time='2009-04-22T14:04:00Z' version='1'>
    <Model fileName='Simulate_TestModelSpecifications' processName='Simulate_TestModelSpecifications' password='Simulate_TestModelSpecifications'/>
    <Simulation>
        <ModelSpecification name='realParam' type='real' value='5.4864'/>
        <ModelSpecification name='intParam' type='integer' value='7'/>
        <ModelSpecification name='boolParam' type='boolean' value='true'/>
        <ModelSpecification name='realArrayParam' type='real' numElements='2'>
            <ArrayElement value='1.0'/>
            <ArrayElement value='2.0'/>
        </ModelSpecification>
        <ModelSpecification name='intArrayParam' type='integer' numElements='2'>
            <ArrayElement value='5'/>
            <ArrayElement value='6'/>
        </ModelSpecification>
        <ModelSpecification name='boolArrayParam' type='boolean' numElements='2'>
            <ArrayElement value='true'/>
            <ArrayElement value='false'/>
        </ModelSpecification>
        <ModelSpecification name='Var1' type='real' value='20.0'/>
        <ModelSpecification name='Var2' type='real' numElements='2'>
            <ArrayElement value='30.0'/>
            <ArrayElement value='40.0'/>
        </ModelSpecification>
        <ModelSpecification name='scalarSelector' type='string' value='apple'/>
        <ModelSpecification name='arraySelector' type='string' numElements='3'>
            <ArrayElement value='red'/>
            <ArrayElement value='yellow'/>
            <ArrayElement value='blue'/>
        </ModelSpecification>
        <ReportVariable pathName='myUnit.Var1'/>
        <ReportVariable pathName='myUnit.Var2(1)'/>
        <ReportVariable pathName='myUnit.Var2(2)'/>
    </Simulation>
</Calculation>

Now the question has arisen about whether it can be completely validated with an XSD? And if it can't, whether it matters that some of the validation will have to be implemented in the SAX parser (which is also under my control)?

My experience with XML schema is very (very) limited but as far as I can see there are 3 potentially tricky issues with validating this XML:

  • The type of the 'value' attribute of an <ArrayElement> is controlled by the value of the 'type' attribute of the parent <ModelSpecification>.
  • A <ModelSpecification> must either have a 'value' attribute or a 'numElements' attribute, but not both.
  • Only <ModelSpecification>s that have a 'numElements' attribute are allowed to contain <ArrayElement>s and the number of those elements is restricted to the value of that attribute.

So my questions are:

  1. Am I correct that these 3 cannot be imposed by the XSD?
  2. Does it matter if I instead impose them in the SAX parser?

Thanks,

Tom

+1  A: 

Yes, XSD doesn't supports [1,3] scenarios you proposes for that XML grammar. For scenario 2, you could use <xsd:choice> with min/maxOccurs=1.

You could also to change elements grammar to don't be so type variant and to create that validation.

If you're parsing your XML by using a SAX parser, you'll need to know that structure and to validate it. So, I think this case you can drop that XSD validation.

But, if it's somebody else who create that XML file, that XSD could be very handy, cause it can be validate externally before arrives into your system.

Rubens Farias
+1  A: 

It's not possible to directly validate that when type="integer", value cannot be "1.23". What you could do is first transform this XML, into something that can be validated more strictly, or change the initial scheme that way.

I think that the attribute "numElements" is a redundant attribute btw, because you can read/deserialize and count the elements instead.

<ModelSpecification name="realParam">
    <RealParam>1.234</RealParam>
</ModelSpecification>
<ModelSpecification name='intParam'>
    <IntParam>7</IntParam>
</ModelSpecification>
<ModelSpecification name='realArrayParam'>
    <Array>
        <RealParam>1.2</RealParam>
        <RealParam>2.1</RealParam>
    </Array>
</ModelSpecification>
Sander Rijken
If I follow this route (which in general I'm not keen on because with XML each level of element hierarchy gives me exponential pain - I'm really not an XML fan) then is it possible in the XSD to impose that the content of <Array> is homogeneous, ie. all <RealParam>, or all <StringParam>?
Tom Williams
+3  A: 

It is possible to write a schema to validate the given XML (with minor modifications), but only under certain conditions. Namely, your third bullet point cannot be defined in XML schema the way you described it (using the numElements value itself to restrict the number of ArrayElements). However, if you know there are only certain values that numElements can have, you can create schema elements to correspond with each option. If numElements is potentially large or may grow over time, this is probably not a good option for you.

I also agree that numElements is redundant (type, too). But I left them in for completeness.

I wanted to prove this to myself before I typed this, so I actually ended up writing the whole schema. Since I did it anyway, I may as well pass it along :) There are two ways of writing the XML that can be validated with this schema. You expressed concern about it being human readable, and I don't know which of the options you'd prefer. (I left out the Real and Bool types in the interest of space).

The first option skips the xsi:type and just renames the <ModelSpecification> elements directly (thank substitution groups for that):

<Calculation xmlns="http://tempuri.org/XMLSchema.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;
  <Model />
  <Simulation>
    <OneValueIntegerModel name='intParam' type='integer' value='7'/>
    <TwoValueIntegerModel name="intArrayParam" type="integer" numElements="2">
      <ArrayElement value="5" />
      <ArrayElement value="6" />
    </TwoValueIntegerModel>
    <OneValueStringModel name='scalarSelector' type='string' value='apple'/>
    <ThreeValueStringModel name='arraySelector' type='string' numElements='3'>
      <ArrayElement value='red'/>
      <ArrayElement value='yellow'/>
      <ArrayElement value='blue'/>
    </ThreeValueStringModel>
    <ReportVariable pathName='myUnit.Var1'/>
    <ReportVariable pathName='myUnit.Var2(1)'/>
    <ReportVariable pathName='myUnit.Var2(2)'/>
  </Simulation>
</Calculation>

The second option keeps the elements named <ModelSpecification> and uses xsi:type to state which type it is:

<Calculation xmlns="http://stackoverflow-sample" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;
  <Model />
  <Simulation>
    <ModelSpecification xsi:type="OneValueIntegerModel" name='intParam' type='integer' value='7'/>
    <ModelSpecification xsi:type="TwoValueIntegerModel" name="intArrayParam" type="integer" numElements="2">
      <ArrayElement value="5" />
      <ArrayElement value="6" />
    </ModelSpecification>
    <ModelSpecification xsi:type="OneValueStringModel" name='scalarSelector' type='string' value='apple'/>
    <ModelSpecification xsi:type="ThreeValueStringModel" name='arraySelector' type='string' numElements='3'>
      <ArrayElement value='red'/>
      <ArrayElement value='yellow'/>
      <ArrayElement value='blue'/>
    </ModelSpecification>
    <ReportVariable pathName='myUnit.Var1'/>
    <ReportVariable pathName='myUnit.Var2(1)'/>
    <ReportVariable pathName='myUnit.Var2(2)'/>
  </Simulation>
</Calculation>

The schema that can be used to validate both of those options is as follows:

<xs:schema elementFormDefault="qualified" 
           targetNamespace="http://stackoverflow-sample" 
           xmlns="http://stackoverflow-sample" 
           xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;

  <xs:element name="Calculation">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Model" />
        <xs:element name="Simulation">
          <xs:complexType>
            <xs:sequence>
              <xs:element ref="ModelSpecification" minOccurs="1" maxOccurs="unbounded" />
              <xs:element name="ReportVariable" minOccurs="0" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:attribute name="pathName" type="xs:string" />
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="ModelSpecification" type="ModelSpecification" abstract="true" />
  <xs:complexType name="ModelSpecification" abstract="true">
    <xs:attribute name="name" type="xs:token" use="required" />
    <xs:attribute name="type" type="xs:QName" use="required" />
  </xs:complexType>

  <!--Integer Model-->

  <xs:complexType name="IntegerArrayElement">
    <xs:attribute name="value" type="xs:integer" use="required" />
  </xs:complexType>

  <xs:element name="OneValueIntegerModel" type="OneValueIntegerModel" substitutionGroup="ModelSpecification" />
  <xs:complexType name="OneValueIntegerModel">
    <xs:complexContent>
      <xs:extension base="ModelSpecification">
        <xs:attribute name="value" type="xs:integer" use="required" />
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <xs:element name="TwoValueIntegerModel" type="TwoValueIntegerModel" substitutionGroup="ModelSpecification" />
  <xs:complexType name="TwoValueIntegerModel">
    <xs:complexContent>
      <xs:extension base="ModelSpecification">
        <xs:sequence>
          <xs:element name="ArrayElement" type="IntegerArrayElement" minOccurs="2" maxOccurs="2" />
        </xs:sequence>
        <xs:attribute name="numElements" type="xs:integer" use="required" fixed="2" />
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <!--String Model-->

  <xs:complexType name="StringArrayElement">
    <xs:attribute name="value" type="xs:string" use="required" />
  </xs:complexType>

  <xs:element name="OneValueStringModel" substitutionGroup="ModelSpecification" />
  <xs:complexType name="OneValueStringModel">
    <xs:complexContent>
      <xs:extension base="ModelSpecification">
        <xs:attribute name="value" type="xs:string" use="required" />
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <xs:element name="ThreeValueStringModel" type="ThreeValueStringModel" substitutionGroup="ModelSpecification" />
  <xs:complexType name="ThreeValueStringModel">
    <xs:complexContent>
      <xs:extension base="ModelSpecification">
        <xs:sequence>
          <xs:element name="ArrayElement" type="StringArrayElement" minOccurs="3" maxOccurs="3" />
        </xs:sequence>
        <xs:attribute name="numElements" type="xs:integer" use="required" fixed="3" />
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

</xs:schema>
barkimedes