tags:

views:

120

answers:

2

When upgrading from libxml2 2.6 to 2.7, some behavior changed for me. I've located the bug report on their site that regards this change, its https://bugzilla.gnome.org/show_bug.cgi?id=571271 .

Interestingly, they report that "and I guess we misinterpreted the expected behaviour of this options (though I'm still not 100% sure)" - they weren't sure if they were reading the spec correctly yet they committed the fix.

I think the previous behavior is correct, so I wanted to see if anyone here has knowledge in either direction.

Basically, does <xs:all>elem1, elem2, ..<xs:all> mean that "all or none of elem1, elem2.. must be present", or "any of elem1, elem2 .. may be present" ? Even though it seems like the former, two sources don't make this clear:

http://www.w3.org/TR/xmlschema-0/#ref18 - "All the elements in the group may appear once or not at all, and they may appear in any order."

http://www.w3schools.com/Schema/el_all.asp - "The example above indicates that the "firstname" and the "lastname" elements can appear in any order and each element CAN appear zero or one time!"

The script below, using lxml, reports success when using libxml2 2.6, but the second schema validation fails on 2.7. Can someone confirm if 2.7 is doing the right or the wrong thing here ?

from lxml import etree
from StringIO import StringIO

schema = """
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;
 <xs:element type="parent-type" name="parent"/>
 <xs:complexType name="parent-type">
   <xs:all maxOccurs="1" minOccurs="0">
     <xs:element type="xs:int" name="int-attr"/>
     <xs:element type="xs:string" name="str-attr"/>
   </xs:all>
 </xs:complexType>
</xs:schema>
"""

xmlschema = etree.XMLSchema(etree.parse(StringIO(schema)))

# passes
doc1 = """
<parent xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.example.com/xml/schemas"&gt;
 <str-attr>some value</str-attr>
 <int-attr>12</int-attr>
</parent>
"""

# fails.  it wants both "int-attr" and "str-attr" to be present.
# didn't think this was how "xs:all" worked ?
doc2 = """
<parent xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.example.com/xml/schemas"&gt;
 <int-attr>12</int-attr>
</parent>
"""

for i, doc in enumerate((doc1, doc2, )):
   doc = etree.parse(StringIO(doc))
   try:
       xmlschema.assertValid(doc)
       print "document %d is valid." % i
   except Exception, e:
       print "document %d is not valid." % i
       print e

output:

document 0 is valid.
document 1 is not valid.
Element 'parent': Missing child element(s). Expected is ( str-attr )., line 2
A: 

Interesting question, I think the libxml behaviour is correct in this case. Note that the quotation from the xsd spec

All the elements in the group may appear once or not at all, and they may appear in any order.

is followed by an example where one of the contained elements has an minOccurs="0" attribute, so I think this is what is meant by not appearing at all. The spec could definitely be clearer. This would also mean that the second example on the w3schools page is wrong.

Jörn Horstmann
If that's the case, then the next question is, how do you define a tag that has zero or one of each child tag ?
zzzeek
+1  A: 

User Jörn Horstmann actually already answered your question correctly but the formatting could make the answer seem a bit unclear. I hope these examples helps those that were left puzzled.

What do minOccurs and maxOccurs mean on <xs:all> element

Remember that <xs:all> and <xs:element> have default value "1" for minOccurs and maxOccurs. Therefore

<xs:all>
  <xs:element type="xs:int" name="int-attr"/>
  <xs:element type="xs:string" name="str-attr"/>
</xs:all>

Is in fact the same as

<xs:all minOccurs="1" maxOccurs="1">
  <xs:element type="xs:int" name="int-attr" minOccurs="1" maxOccurs="1"/>
  <xs:element type="xs:string" name="str-attr" minOccurs="1" maxOccurs="1"/>
</xs:all>

This means that the whole <xs:all> group is mandatory as well as both of the elements defined in it - the order is free. Thus XML document

<parent xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:noNamespaceSchemaLocation="http://www.example.com/xml/schemas"&gt;
  <int-attr>12</int-attr>
</parent>

would be invalid. Using attribute minOccurs="0" on <xs:all> means that the whole group is optional and in this case it would allow also an empty <parent/> element. I see that this is what the spec really means with "All the elements in the group may appear once or not at all". I am not a native English speaker but I would also say that the second example on the w3schools page is incorrect. It should read "both elements CAN appear zero or one time" instead of "each element CAN appear zero or one time".

maxOccurs attribute of <xs:all> is fixed to value "1".

How to define a tag that has zero or one of each child tag

So this is what you asked in your comment and what you tried to validate in the first place. Optional elements inside <xs:all> group are achieved by adding attribute minOccurs="0" on those elements. Example below

<xs:all minOccurs="1" maxOccurs="1">
  <xs:element type="xs:int" name="int-attr" minOccurs="0" maxOccurs="1"/>
  <xs:element type="xs:string" name="str-attr" minOccurs="0" maxOccurs="1"/>
</xs:all>

This schema would validate XML document

<parent xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:noNamespaceSchemaLocation="http://www.example.com/xml/schemas"&gt;
  <int-attr>12</int-attr>
</parent>

Since both of the elements are optional (because they have minOccurs="0") this definition also allows an empty <parent/> element. Although the cardinality restriction on elements in a way "overrides" the one set on <xs:all> the spec also says: "no element in the content model may appear more than once, i.e. the permissible values of minOccurs and maxOccurs are 0 and 1". So you can't have a group that has same elements multiple times in random order, or at least you can't use <xs:all> to create such type.

jasso