views:

72

answers:

4

So there's an XSD schema that validates a data file. It declares root element of the document, and then go complexType's that describe structure. The schema has empty target namespace, document nodes are not supposed to be qualified with a namespace.

Recently someone by mistake sent an XSL template in place of an XML data file. That xsl passed validation no problem and was therefore directed to the XSLT processor. Result was basically the free-form text found in the validated XSL.

We then sent all sorts of XML documents to the validator (like, various XSD schemas and XSL templates), and they all passed validation.

We tried different ways of validation (XPathDocument.CheckValidity and XMLDocument.Validate), no difference.

What is happening anyway? Is our validation schema happy to pass any documents whose root nodes are qualified to a namespace different to what the schema describes? How do we prevent that?

EDIT

Validation code (version 1):

Dim data As XPathDocument
....
If Not data.CreateNavigator.CheckValidity(ValidationSchemaSet, AddressOf vh.ValidationHandler) Then
    result = "Validation failed." & ControlChars.NewLine & String.Join(ControlChars.NewLine, vh.Messages.ToArray)
    Return False
End If

, where vh is:

Private Class VHandler
    Public Messages As New List(Of String)

    Public Sub ValidationHandler(ByVal sender As Object, ByVal e As ValidationEventArgs)
        If e.Severity = XmlSeverityType.Error Then
            Messages.Add(e.Message)
        End If
    End Sub
End Class

XSD schema:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;

  <xs:include schemaLocation="CarrierLabel_Type_1.xsd" />
  <xs:include schemaLocation="CarrierLabel_Type_2.xsd" />
  <xs:include schemaLocation="CarrierLabel_Type_3.xsd" />

  <!-- Schema definition -->
  <xs:element name="PrintJob" type="printJobType" />


  <!-- Types declaration -->
  <xs:simpleType name="nonEmptyString">
    <xs:restriction base="xs:string">
      <xs:minLength value="1"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:complexType name="printJobType">
    <xs:sequence minOccurs="1" maxOccurs="unbounded">
      <xs:choice>
        <xs:element name="CarrierLabel_type_1" type="CarrierLabel_type_1" />
        <xs:element name="CarrierLabel_type_2" type="CarrierLabel_type_2" />
        <xs:element name="CarrierLabel_type_3" type="CarrierLabel_type_3" />
      </xs:choice>
    </xs:sequence>

    <xs:attribute name="printer" type="nonEmptyString" use="required" />
    <xs:attribute name="res" type="xs:positiveInteger" use="required" />
  </xs:complexType>

</xs:schema>

Should (and will) pass:

<?xml version='1.0' encoding='utf-8'?>
<PrintJob printer="printer_1" res="200">
  <CarrierLabel_type_1>
    <print_job_id>123456</print_job_id>
    <notes></notes>
    <labels_count>1</labels_count>
    <cases_indicator>2xCASE</cases_indicator>
  </CarrierLabel_type_1>
  <CarrierLabel_type_2>
    <next_location>Go there now!</next_location>
  </CarrierLabel_type_2>
</PrintJob>

Should not pass, but WILL PASS AS VALID DATA:

<?xml version='1.0' encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

  <xsl:output method="text"/>

  <xsl:template match="WrongLabel">
    <xsl:param name="context"/>
    <xsl:param name="res"/>
    WRONG LABEL
  </xsl:template>

</xsl:stylesheet>
A: 

Without having seen any code, I'm going to take a stab and suggest that it just may be because your validation is setting the ValidationType on the XmlReaderSettings object, but you're either not wiring up the ValidationEventHandler to check for validation errors or simply not doing anything with these validation events.

Even with XmlDocument.Validate, you need to wire up this ValidationEventHandler.

See MSDN here.

Wim Hollebrandse
But if an invalid XML data document is passed, validation will fail and we will see the reasons. So the handlers are working.
GSerg
OK, that didn't seem clear from your original question.
Wim Hollebrandse
A: 

My understanding is that XML Schema (XSD) does not give any way of requiring that the root node of a document is a certain element -- the only way to do that is to restrict what elements are defined at "global level" to just one element. Is it possible that your validation code is importing the schema for XSLT, so that when it sees an XSLT document it validates because the XSLT elements have been defined at global level.

Vincent Marchetti
No, the schemas do not declare XSLT namespace. Each of `include`d schemas only contain exactly one complexType and no elements at all, so those can't be used as stand-alone schemas, they are data type vocabularies effectively. And my understanding of XSD was that you define the outermost xs:element and it describes the root. That's not to say you are wrong, but to only say my understanding was different. What is a proper way then?
GSerg
We're not disagreeing, the form of XSD file you describe (with only one element defined at outermost level) will only validate on documents with that element as the root node (with the complications you describe in your solution to your original problem). However, this way of writing XSD files, with only one outermost element, really restricts how you can use XSD files, it makes it impossible to modularize these files and import elements from other XSD files. After dealing with this, I came to the conclusion that XSD was never intended to enforce what root element a document can have.
Vincent Marchetti
A: 

Right.

It turned out, validation has three possible results, not two -- valid, invalid and unknown. So Boolean return value of CheckValidity function is somewhat surprising.

If the root node of the document is not described by the schema, the document passes validation without errors, and no validation events occur, but the root node receives "unknown" status. This, for our purpose, is a fail. So we also need to check the XMLNode.SchemaInfo.Validity member of the root node.

I wish Validate() method documentation was a bit clearer on that.

GSerg
A: 

XML schemas really validate elements within a namespace, not documents. There's no XML Schema rule that says that the top-level element of the instance document must be within a specific namespace. This fits in with the general idea that a namespace is its own little world, and it prevents me from writing a schema in my namespace that will invalidate documents in yours. If an element's not in my namespace, it's none of my business

This means that when validating instance documents, you have to check to make sure that the top-level element of the document you're validating is in a namespace that your application accepts - which, in your application, is simply the default namespace.

Robert Rossney
That was insightful.
GSerg