views:

641

answers:

1

I use the following XmlSchema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://www.test.com/XmlValidation"
  elementFormDefault="qualified"
  attributeFormDefault="unqualified"
  xmlns:m="http://www.test.com/XmlValidation"&gt;

  <xs:element name="test">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="testElement" type="m:requiredStringType"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:simpleType name="requiredStringType">
    <xs:restriction base="xs:string">
      <xs:minLength value="1"/>
      <xs:whiteSpace value="collapse"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

It defines a requiredStringType that must be at least one character long and also defines whitespace collapse.

When I validate the following Xml document the validation succeedes:

<?xml version="1.0" encoding="UTF-8"?>
<test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.text.com/XmlValidation"&gt;
    <testElement>     </testElement>
</test>

w3.org defines for whitespace collapse:

"After the processing implied by replace, contiguous sequences of #x20's are collapsed to a single #x20, and leading and trailing #x20's are removed."

Does this mean that 3 whitespaces are collapsed to one or to zero whitespaces? In XmlSpy the validation fails, in .Net it succeeds.

+2  A: 

Since it says that leading and trailing whitespace are removed, that means that a string that contains only whitespace will be collapsed to an empty string. XmlSpy is being accurate in the validation and .NET is being generous (or is making an error).

This is according to White Space Normalization during Validation from XML Schema Part 1: Structures Second Edition.

preserve
No normalization is done, the value is the ·normalized value·
replace
All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced > with #x20 (space).
collapse
Subsequent to the replacements specified above under replace, contiguous sequences of #x20s are collapsed to a single #x20, and initial and/or final #x20s are deleted.

Thus, first all whitespace is replaced by blank characters, second contiguous sequences are replaced with a single blank character, third and last, initial and final blanks are deleted. Following this sequence, a string containing only whitespace must be normalized to an empty string during validation.

Eddie
I'm not sure if the remaing WhiteSpace (the one after collaps) is handled as leading whitespace. Maybe the bahavior of .Net is correct. I didn't find any XmlValidator on the w3.org site that could proof me wrong.
crauscher