views:

1255

answers:

3

I'm running into real difficulties validating xml with xsd. I should prefix all of this and state up front, I'm new to xsd and validation, so I'm not sure if it's a code issue or an xml issue. I've been to xml api hell and back with the bajillion different options and think that I've found what would be the ideal strategy for validating xml with xsd. Note, my xml and xsd are coming from a database so I don't need to read anything from disk.

I've narrowed my problem down into a simple sample windows form app. It has a textbox for xsd (txtXsd), a textbox for xml (txtXml), a textbox for the result (txtResult), and a button to start the validation (btnValidate).

I'm using a sample xsd file from Microsoft.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="urn:bookstore-schema" elementFormDefault="qualified" targetNamespace="urn:bookstore-schema">
 <xsd:element name="title" type="xsd:string" />
 <xsd:element name="comment" type="xsd:string" />
 <xsd:element name="author" type="authorName"/>
 <xsd:complexType name="authorName">
  <xsd:sequence>
   <xsd:element name="first-name" type="xsd:string" />
   <xsd:element name="last-name" type="xsd:string" />
  </xsd:sequence>
 </xsd:complexType>
</xsd:schema>

I'm using the following code in my app.

private void btnValidate_Click (object sender, EventArgs e)
{
 try
 {
  XmlTextReader reader = new XmlTextReader(txtXsd.Text, XmlNodeType.Document, new XmlParserContext(null, null, String.Empty, XmlSpace.None));
  XmlSchema schema = XmlSchema.Read(reader, null);
  XmlSchemaSet schemas = new XmlSchemaSet();
  schemas.Add(schema);

  XDocument doc = XDocument.Parse(txtXml.Text);
  doc.Validate(schemas, ValidateSchema);
 }
 catch (Exception exception)
 {
  txtResult.Text += exception.Message + Environment.NewLine;
 }
}

private void ValidateSchema (Object sender, ValidationEventArgs e)
{
 txtResult.Text += e.Message + Environment.NewLine;
}

As a test, I put in valid xml but what I think should not conform to the xsd above.

<xml>
 <bogusNode>blah</bogusNode>
</xml>

The result is nothing, no validation errors what soever. Any ideas?

+2  A: 

Well, for one - your XSD defines a XML namespace xmlns="urn:bookstore-schema" which is not present in your XML test file - therefore, nothing in your XML test file will be validated.

If you remove those elements form your schema:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"&gt;
        <xsd:element name="title" type="xsd:string" />

then it will properly validate your XML test file and complain about the wrong elements.

Also using an element named <xml> might not be a great idea - since the directive <?xml ......?> is a pre-defined directive and should not appear as tag name elsewhere in your document.

Marc

marc_s
I had a sneaking suspicion that's what was going on. Going to pickup an xml book.
Joshua Belden
In the larger picture, I'm trying to make sure that xml submitted conforms to the xsd. Is there a way to validate if they're not in the same namespace? Do I have to actually do that in code by looking at both namespaces?
Joshua Belden
Joshua: the XML you're validating must be the same namespace(s) as the XSD defines. That's the whole point of XML namespaces - being able to keep identifiers / tags that might be generally used (like '<Address>') apart by putting them into their own namespace (like a .NET namespace).
marc_s
I guess what I'm trying to do is use an XSD like a C# interface. I don't want the user to submit xml that's not defined in the xsd. I'm starting to think that's not really possible. Is it true that I could have an xml document and only a portion of it would be validatable by the xsd, perhaps one portion that is in the namespace? Or does a whole xml doc have to be in the namespace? If that's the case than I can check for the same namespace and determine it's not valid. I know I'm talking a bit out side of reality, It's one of those things, I don't know enough to ask the right questions.
Joshua Belden
Not 100% sure what would happen if you have a XSD defined in a namespace, and only part of your document uses that namespace. What is definitely possible is to validate only part of your XML document against the schema - by just validating a node and its subtree (its children and grand-children and so on). That's definitely possible.
marc_s
Another option you might look into is turning your XSD into a C# class, and then just basically trying to deserialize your incoming XML into an instance of that class. If it works, the incoming XML is valid - if it's not, the deserialization process will fail.
marc_s
Hey Joshua - you might want to check out this "Linq-to-XSD" thing on Codeplex: http://linqtoxsd.codeplex.com/
marc_s
Another option is to add a namespace to the XML: <xml xmlns="urn:bookstore-schema">. I agree it would be sensible if there was a way for XSD to reject a document regardless of its namespace (maybe there is a way).
13ren
The API lets you define multiple schemas to validate one document. When the document is loaded, nodes in namespace X will be validated against the schema with targetNamespace = X, nodes in namespace Y will be validated against the schema with targetNamespace = Y, and nodes in namespace Z will be ignored if you haven't set a schema for targetNamespace = Z.
Christian Hayter
A: 

You may also try XmlValidatingReader for XML validation

Scoregraphic
As far as I know, the XmlValidatingReader is deprecated with .NET 2.0 and should no longer be used.MSDN says: "The XmlValidatingReader class is obsolete in Microsoft .NET Framework version 2.0. You can create a validating XmlReader instance by using the XmlReaderSettings class and the Create method. For more information, see Validating XML Data with XmlReader."
marc_s
A: 

I don't want the user to submit xml that's not defined in the xsd.

Why do you care? Your schema validates the XML nodes that are in your namespace. Your processing logic processes the XML nodes that are in your namespace. Nodes that aren't in your namespace aren't relevant to either your schema or your logic.

If it's truly essential to restrict all nodes in the XML document to a specific namespace, you can accomplish that by extending the basic XmlReader validating logic found here.

    public static void Main()
    {
      const string myNamespaceURN = "urn:my-namespace";

      XmlSchemaSet sc = new XmlSchemaSet();
      sc.Add(myNamespaceURN, "mySchema.xsd");

      XmlReaderSettings settings = new XmlReaderSettings();
      settings.ValidationType = ValidationType.Schema;
      settings.Schemas = sc;
      settings.ValidationEventHandler += ValidationCallBack;

      XmlReader reader = XmlReader.Create("myDocument.xml", settings);

      while (reader.Read())
      {
          if ((reader.NodeType == XmlNodeType.Element ||
               reader.NodeType == XmlNodeType.Attribute)
              &&
              reader.NamespaceURI != myNamespaceURN)
          {
              LogError(reader.NamespaceURI + " is not a valid namespace.");
          }
      }
    }

    private static void ValidationCallBack(object sender, ValidationEventArgs e)
    {
        LogError(e.Message);
    }

    private static void LogError(string msg)
    {
        Console.WriteLine(msg);
    }
Robert Rossney