views:

421

answers:

1

I have to make a schema for an XML format that's already being used. The existing XML is being generated already by a different program, and it sounds like it would be difficult to track down all the places that would need to be changed in order to use a more regular XML format.

Here's an example similar to our XML structure:

<data>
    <summary>some info</summary>
    <error>error message (only if there was an error)</error>
    <details>more info
        <x>more</x>
        <y>even more</y>
    </details>
    <error>another error message</error>
    <z>some extra info</z>
</data>

Note that the error tag is reused at the same level and comes after certain items but not others, so I can't just set maxOccurs="unbounded". I've tried wrapping the associated pairs of error/other tags in an xsd:sequence, but that doesn't do the trick because I'm still effectively breaking the Unique Particle Attribution rule.

Can this even be done, or do I need to let the other developers know this schema isn't going to validate?

+2  A: 

My reading of the XML schema standard says that you are probably safe. You just define non-consecutive elements with the same name in your schema, to reflect the XML as it will (or can be) generated. As long as all "error" instances are always separated by other elements and not consecutive, this shouldn't be a problem. For example, something like:

  <xs:element name="data">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="summary" minOccurs="1" maxOccurs="1" type="xs:string" />
        <xs:element ref="error" minOccurs="1" maxOccurs="1" />
        <xs:element name="details" minOccurs="1" maxOccurs="1" type="detailsType" />
        <xs:element ref="error" minOccurs="1" maxOccurs="1" />
        <xs:element name="z" minOccurs="0" maxOccurs="1" type="xs:string" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="error" type="xs:string"/>

  <xs:complexType name="detailsType">
    ....
  </xs:complexType>

If details has minOccurs="0" and the first "error" above has a maxOccurs > 1, then you fail the Unique Particle Attribution rule, because validation cannot verify, if you have two error elements in a row, which instance of "error" in the schema they belong to. However, as long as each instance of "error" can be uniquely identified in the schema, due to good use of "minOccurs" and "maxOccurs" for the error elements and for intervening elements, then you are good.

You can even have consecutive instances of "error" as long as the schema validator can always unambiguously figure out which instance is being referred to by using minOccurs and maxOccurs (for example).

Think about xhtml, in which the elements may occur in any order, with arbitrary repetition.

EDIT: Updated to reflect edits in original question.

Eddie