views:

140

answers:

4

Maybe it's me, but it appears that if you have an XSD

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;
    <xs:element name="User">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="GivenName" />
                <xs:element name="SurName" />
            </xs:sequence>
            <xs:attribute name="ID" type="xs:unsignedByte" use="required" />
        </xs:complexType>
    </xs:element>
</xs:schema>

that defines the schema for this document

<?xml version="1.0" encoding="utf-8" ?>
<User ID="1">
    <GivenName></GivenName>
    <SurName></SurName>
</User>

It would fail to validate if you added another element, say EmailAddress, and mix up the order

<?xml version="1.0" encoding="utf-8" ?>
<User ID="1">
    <SurName></SurName>
    <EmailAddress></EmailAddress>
    <GivenName></GivenName>
</User>

I don't want to add EmailAddress to the document and have it be marked optional.

I just want an XSD that validates the bare minimum requirements that the document must meet.

Is there a way to do this?

EDIT:

marc_s pointed out below that you can use xs:any inside of xs:sequence to allow more elements, unfortunately, you have to maintain the order of elements.

Alternatively, I can use xs:all which doesn't enforce the order of elements, but alas, doesn't allow me to place xs:any inside of it.

+4  A: 

You should be able to extend your schema with the <xs:any> element for extensibility - see W3Schools for details.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;
    <xs:element name="User">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="GivenName" />
                <xs:element name="SurName" />
                <xs:any minOccurs="0" maxOccurs="unbounded" processContents="lax" />
            </xs:sequence>
            <xs:attribute name="ID" type="xs:unsignedByte" use="required" />
        </xs:complexType>
    </xs:element>
</xs:schema>

When you add the processContents="lax" then the .NET XML validation should succeed on it.

See MSDN docs on xs:any for more details.

Update: if you require more flexibility and less stringent validation, you might want to look at other methods of defining schemas for your XML - something like RelaxNG. XML Schema is - on purpose - rather strict about its rules, so maybe that's just the wrong tool for this job at hand.

marc_s
@marc_s, while, that almost works, the order becomes important. GivenName and SurName must be the first and second elements respectively...at least I think that's the effect of xs:sequence. I could change it to xs:all. But then I can't use xs:any...
Chad
@marc_s, which tells me that XSD is inherently broken. XML is supposed to be *eXtensible*. Order of elements isn't supposed to matter. So first, why can you define a sequence? Second, why can't we write extensible XSDs?
Chad
@marc_s, I realize *'a sequence is a sequence'*, I think it's bad that it exists, as it violates the basic principle of XML, extensibility. Aside from that, why can't I define an XSD that is truly extensible. Using `xs:sequence` and `xs:any` sort of works, but I have to ensure that the *sequenced* elements are first, and appear in order... which, shouldn't matter given the nature of XML.
Chad
@Chad: Maybe, if you really need this kind of flexibility, XML schema is just the wrong tool for the job. Have you ever looked at RelaxNG ?? http://www.relaxng.org/
marc_s
@marc_s, I need to define a minimum level of conformance. Nowhere did I say "anything, anywhere, anytime". I need to define a minimum document, and allow it to be extensible. If you need to define a "sequence" of elements in your XML, I would say you should define it IN the element with a sequence attribute. Relying on the order of the elements in the document feels like an affront to the essence of XML.
Chad
+3  A: 

After reading of the answer of marc_s and your discussion in comments I decide to add a little.

It seems to me there are no perfect solution of your problem Chad. There are some approaches how to implement extensible content model in XSD, but all me known implementation have some restrictions. Because you didn't write about the environment where you plan to use extensible XSD I can you only recommend some links which probably will help you to choose the way which can be implemented in your environment:

  1. http://www.xfront.com/ExtensibleContentModels.html (or http://www.xfront.com/ExtensibleContentModels.pdf) and http://www.xfront.com/VariableContentContainers.html
  2. http://www.xml.com/lpt/a/993 (or http://www.xml.com/pub/a/2002/07/03/schema_design.html)
  3. http://msdn.microsoft.com/en-us/library/ms950793.aspx
Oleg
+1 for linking to the xfront website, still a classic treatment of the subject.
Abel
+1  A: 

Well, you can always use DTD :-) except that DTD also prescribes ordering. Validation with "unordered" grammar is terribly expensive. You could play with xsd:choice and min and max occurs but it's probably going to balk as well. You could also write XSD extensions / derived schemas.

The way you posed the problem it looks like you don't really want XSD at all. You can just load it and then validate whatever minimum you want with XPaths, but just protesting against XSD, how many years after it became omni-present standard is really, really not going to get you anywhere.

ZXX
The problem with loading, then validating with XPaths is it ends up in code, and is tough to change. I'm not protesting against XSD, I use them a fair bit, but it never really occurred to me until this problem that they missed the mark. IMHO, if you have a data format who's best selling feature is it's extensibility, but don't allow for truly extensible definitions in structure...you failed. Or maybe I'm just crazy.
Chad
@Chad / @zb_z: if it helps: Tim Bray and other really well known names at W3C also consider that they *missed the mark*. I've tried to explain this issue, which also applies to DTD by the way, which is called the *Unique particle Constraint*, see above (or below) ;-)
Abel
ZXX
@ZXX: those are indeed the arguments used in favor *against* non-deterministic schemas. But meanwhile, actually, already at the time, it has been proven that it's not so difficult as it seems and the performance drop proved negligible. Both Schematron and Relax NG have shown that non-determinism is not a problem. Whether it's a good design of your schema is a whole other story, of course.
Abel
Any perf results and implementations without Haskell or backtracking of other kind?XSD validation is expected to be 0 lookahead which has some important perf, stability and streaming consequences. Like that it gets trusted for automatic code generation and serialization. One needs guaranteed worst case complexity for that.If my memory serves me Relax NG doesn't have cardinality constraints which XSD does (much more expensive to validate with interleaving) and is by and large not 0 lookahead. Let's not forget that with 1 lookahead one can parse C code and still no NFA.
ZXX
+6  A: 
Abel
Very good analysis. Thank you.
John Saunders
@John, you're welcome. Now I only hope it also helps Chad with his problem ;-)
Abel
Wow...good answer. It definitely explains the reasoning behind it! The other idea I had was a combination xsl/xsd, whereby I run the input xml through and xslt only keeping the elements I wish to validate (min requirements) then validate that against the xsd. If it passes, allow the original xml through. But I don't think it will be *performant* enough.
Chad
@Chad: if you consider using XSLT + XPath, consider switching from W3C XML Schema to [Schematron](http://www.schematron.com/overview.html). It's also an ISO-standard XML Schema Language and is made to work well with XSLT, XPath and a bit of regular expressions. Schematron works the other way around: it is rule based: you define rules for tree patterns as opposed to a grammar with XSD. If you have some experience with XSLT, it should be [easy to adopt its only 6 (!) elements](http://www.schematron.com/elements.html).
Abel