views:

356

answers:

4

Essentially I want to embed some XHTML in an XML document that must validate against a custom schema.

Full background:

I have a webservice that consumes an XML document. This XML document is validated against a custom schema. The data in the XML is parsed and stored in a database, and displayed in a useful format on a website.

The customer who fires the XML at my webservice has its own internal "IT / programmer guy". He wants to be able to display some custom XHMTL in some placeholders on some of the websites pages.

We have agreed that he can extend the XML that he fires at my webservice to include 3 new elements that will contain the HTML, and I will adjust my schema accordingly. I'll also do the processing to get his XHTML out of the XML doc an on to the web pages.

I don't want to use cdata as that could be quite insecure (I think!), so I was trying to use an <xs:any> in the schema:

<xs:element name="SomeXhtmlStuff" minOccurs="0">
  <xs:complexType>
    <xs:sequence>
      <xs:any minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

I was thinking this would mean that any valid XML would be OK in the element, e.g. all XHTML tags would be fine, however I tried this:

<SomeXhtmlStuff>
  <p>This is a test HTML output for Job Details</p>
</SomeXhtmlStuff>

and the XML won't validate against it. Edit: Visual Studio 2008 in it automatic validator gives the error "the 'p' element is not declared"

I haven't got much experience with XML/schema and I inherited this project, any suggestions would be more than welcome!

Thanks in advance!

A: 

I believe using a CDATA section is better. In the world of (X)HTML, there happens to exist documents with invalid markup. Despite this, the browsers are actually displaying useful stuff. So, the problem is that you are likely to not get responses from time to time, because the other end tried to send invalid XHTML inside an XML document and ceased to function properly.

By the way, Atom and RSS publishers use CDATA sections for inserting XHTML/HTML markup.

Ionuț G. Stan
A: 

What was the validation error you received?

I believe that <xs:any/> means "any XML that will validate". What will <p/> validate against?

John Saunders
"the p element is not declared" is the error that visual studio 2008 is giving me.
bplus
And, did you declare the p element? Did anyone?
John Saunders
A: 

After some more googling I found this schema snippet which seems to work:

<xs:element name="SomeXhtmlStuff" minOccurs="0" >
    <xs:complexType>
    <xs:complexContent mixed="true">
      <xs:restriction base="xs:anyType">
        <xs:sequence>
          <xs:any processContents="skip"
                  minOccurs="0"
                  maxOccurs="unbounded"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  </xs:element>
bplus
A: 

You should probably consider putting the xhtml namespace into that xs:any element. You might also want to change the processContents attribute to 'lax'. The lax attribute value informs the validator that it should validate the content if it can locate a definition. So, a better element model might be:

<xs:element name="SomeXhtmlStuff" minOccurs="0" >
    <xs:complexType>
    <xs:complexContent mixed="true">
      <xs:restriction base="xs:anyType">
        <xs:sequence>
          <xs:any processContents="lax"
                  namespace="http://www.w3.org/1999/xhtml"
                  minOccurs="0"
                  maxOccurs="unbounded"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  </xs:element>

Of course, you might also want to change that mixed content type if he is just inserting elements into your xml.

Nic Gibson
thanks for that! What should I change "mixed" to then?
bplus
Apologies; late reply caused by hospitalised child. I should have clearer. I meant that you should have mixed='false' unless you do want a mixed content model (text and elements) inside the SomeXhtmlStuff element. I suspect that you don't. False is the default for mixed so you could omit it anyway.
Nic Gibson