views:

134

answers:

2

What is the simplest/clearest style of XSD for this kind of XML? (it's from this answer)

<object name="contact">
  <object name="home">
    <object name="tel">
       <string name="area" value="910"/>
       <string name="num" value="1234 5678"/>
    </object>
  </object>
  <object name="work">
    <object name="tel">
       <string name="area" value="701"/>
       <string name="num" value="8888 8888"/>
    </object>
    <object name="fax">
       <string name="area" value="701"/>
       <string name="num" value="9999 9999"/>
    </object>
  </object>
</object>

EDIT I moved my example XSD and clarification into an answer.

+2  A: 

If you data is not totally free format, I would make the XML specific to your data model:

<contact>
  <home>
    <tel>
       <area>910</area>
       <num>1234 5678</num>
    </tel>
  </home>
  <work>
    <tel>
       <area>701</area>
       <num>8888 8888</num>
    </tel>
    <fax>
       <area>701</area>
       <num>9999 9999</num>
    </fax>
  </work>
</contact>

However, assuming that you have a reason for doing it the way that you're doing it (for example, assuming that your data truly is totally free-format structured data), you could make the XSD a little bit clearer by doing something like this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;
  <xs:element name="object">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="string"/>
        <xs:element ref="object"/>
      </xs:choice>
      <xs:attribute name="name" type="xs:string"/>
    </xs:complexType>
  </xs:element>

  <xs:element name="string">
    <xs:complexType>
      <xs:attribute name="name" type="xs:string"/>
      <xs:attribute name="value" type="xs:string"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

I prefer schemas where each element is defined in a standalone way -- as much as possible -- and any type that is used in multiple places is also defined separately. In your case, there is no reused type.

When an XSD is deeply nested, it gets harder to read and harder to support and modify.

Note: You can make the object name optional by making this change:

  <xs:element name="object">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="string"/>
        <xs:element ref="object"/>
      </xs:choice>
      <xs:attribute name="name" type="xs:string" use="optional"/>
    </xs:complexType>
  </xs:element>

But don't also make the name on the string type optional! (At least from what you've shown us, it doesn't make sense to do that.)

Eddie
I didn't bother unnesting the string because I wanted to make it match the single production but you're right. It makes a bigger difference than I anticipated, because it unclutters the object, making it more focussed and clearer.But I was interested in if there are different approaches altogether that are simpler - such as using types, groups or something I hadn't thought of. I'll try to clarify the question.
13ren
Making the object name optional doesn't distinguish between when the object is the root, and when it is within another object. The latter always needs a name. This is the special case - I didn't make this this explicit in the question; I've now clarified.
13ren
A: 

Here is an XSD - is its grammar the clearest approach?

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;
  <xs:element name="object">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">

        <xs:element name="string">
          <xs:complexType>
            <xs:attribute name="name" type="xs:string"/>
            <xs:attribute name="value" type="xs:string"/>
          </xs:complexType>
        </xs:element>

        <xs:element ref="object"/>

      </xs:choice>
      <xs:attribute name="name" type="xs:string"/>
    </xs:complexType>
  </xs:element>

</xs:schema>

The basic idea of the XML is nested objects:

V --> string | O      // a Value is a string or an Object
O --> (K V)*          // an Object is list of named values (Key-Value pairs)

But it's slightly different, in that the root is always an Object (not a string), and it itself is named (even though it isn't inside another Object):

O ==> (string K | O)* K

I am open to changing the format itself slightly, to give different XML, if that will make a simpler/clearer XSD. If an Object always has a name, this removes special cases, which makes the grammar and XSD more regular - and simpler. Therefore, it seems simpler for an Object to always have a name.

Clarification: the special case would be that when the Object is the root it is not named, but in all other cases it is named. This requires an extra header section, like this:

O'==> (string K | O)*
O ==> (string K | O)* K

Handling this special case is more complex than the original, even when refactored to minimize that complexity:

F ==> (string K | O)*
O ==> F K
13ren