tags:

views:

726

answers:

2

The description is bit on the longer side please bear with me. I would like to process and validate a huge XML file and log the node which triggered the validation error and continue with processing the next node. A simplified version of the XML file is shown below.

What I would like to perform is on encountering any validation error processing node 'A' or its children (both XMLException and XmlSchemaValidationException) I would like to stop processing current node log the error and XML for node 'A' and move on to the next node 'A'.

<Root>
  <A id="A1">
     <B Name="B1">
        <C>
          <D Name="ID" >
            <E>Test Text 1</E>
          </D>
        <D Name="text" >
          <E>Test Text 1</E>
        </D>        
      </C>
    </B>
  </A>
  <A id="A2">
    <B Name="B2">
      <C>
        <D Name="id" >
          <E>Test Text 3</E>
        </D>
        <D Name="tab1_id"  >
          <E>Test Text 3</E>
        </D>
        <D Name="text" >
          <E>Test Text 3</E>
        </D>
      </C>
    </B>
</Root>

I am currently able to recover from the XmlSchemaValidationException by using a ValidationEventHandler with XMLReader which throws a Exception that I handle in the XML Processing code. However for some cases XMLException is being triggered which leads to termination of the process.

The following snippets of the code illustrate the current structure I am using; it is messy and code improvement suggestions are also welcome.

    // Setting up the XMLReader
    XmlReaderSettings settings = new XmlReaderSettings();
    settings.ConformanceLevel = ConformanceLevel.Auto;
    settings.IgnoreWhitespace = true;
    settings.CloseInput = true;
    settings.IgnoreComments = true;
    settings.ValidationType = ValidationType.Schema;
    settings.Schemas.Add(null, "schema.xsd");
    settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
    XmlReader reader = XmlReader.Create("Sample.xml", settings);   
    // Processing XML
    while (reader.Read())
    if (reader.NodeType == XmlNodeType.Element)
       if (reader.Name.Equals("A"))
         processA(reader.ReadSubtree());         
    reader.Close(); 
   // Process Node A
   private static void processA(XmlReader A){
    try{
       // Perform some book-keeping 
       // Process Node B by calling processB(A.ReadSubTree())            
    }   
    catch (InvalidOperationException ex){

    }
    catch (XmlException xmlEx){

    } 
    catch (ImportException impEx){

    }
    finally{ if (A != null) A.Close(); }            
  }
  // All the lower level process node functions propagate the exception to caller.
  private static void processB(XmlReader B){
   try{
     // Book-keeping and call processC
   }
   catch (Exception ex){
    throw ex;
    }
   finally{ if (B != null) B.Close();}    
  } 
  // Validation event handler
  private static void ValidationCallBack(object sender, ValidationEventArgs e){
    String msg =  "Validation Error: " + e.Message +" at line " + e.Exception.LineNumber+
     " position number "+e.Exception.LinePosition;
    throw new ImportException(msg);
  }

When a XMLSchemaValidationException is encountered the finally block will invoke close() and the original XMLReader is being positioned on the EndElement of the subtree and hence the finally block in processA will lead to processing of the next node A.

However when a XMlException is encountered invoking the close method is not positioning the original reader on the EndElement node of the subtree and an InvalidOperationException is being throw.

I tried to use methods like skip, ReadToXYZ() methods but these are invariably leading to XMLExcpetion of InvalidOperationException when invoked on any node that triggered an exception.

The following is a excerpt from MSDN regarding the ReadSubTree method.

When the new XmlReader has been closed, the original XmlReader will be positioned on the EndElement node of the sub-tree. Thus, if you called the ReadSubtree method on the start tag of the book element, after the sub-tree has been read and the new XmlReader has been closed, the original XmlReader is positioned on the end tag of the book element.

Note: I cannot use .Net 3.5 for this, however .Net 3.5 suggestions are welcome.

+3  A: 

See this question:
http://stackoverflow.com/questions/32505/xml-parser-validation-report

You need to distinguish between well-formed xml (it follows the rules required to be real xml) and valid xml (follows additional rules given by a specific xml schema). From the spec:

Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).

For better or worse, the xml tools included with Visual Studio need to follow that spec very closely, and therefore will not continue processing if there is a well-formedness error. The link I provided might give you some alternatives.

Joel Coehoorn
+1  A: 

I have done this almost like you have, except for the exception eating and questionable use XmlReader.Close():

XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Auto;
settings.IgnoreWhitespace = true;
settings.CloseInput = true;
settings.IgnoreComments = true;
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(null, "schema.xsd");
settings.ValidationEventHandler += ValidationCallBack;
using (XmlReader reader = XmlReader.Create("Sample.xml", settings))
{
    while (reader.Read())
    {
        if (reader.NodeType == XmlNodeType.Element &&
            reader.Name.Equals("A"))
        {
            using (
                XmlReader subtree = reader.ReadSubtree())
            {
                try {
                    processA(subtree);
                }
                catch (ImportException ex) { // log it at least }
                catch (XmlException ex) {// log this too }
                // I would not catch InvalidOperationException - too general
            }
        }
    }
}

private static void processA(XmlReader A)
{
    using (XmlReader subtree = A.ReadSubtree())
    {
        processB(subtree);
    }
}

// All the lower level process node functions propagate the exception to caller.  
private static void processB(XmlReader B)
{
}

You don't want to eat the exceptions, except for those from processA. Let the caller of processA decide wihch exceptions to ignore - processA should not be aware of that. Same with the using blocks - put them on the outside, rather than the inside.

John Saunders
I was handling the exception removed the code to keep it concise. I need to perform few other actions in the finally block so I saw no advantage with using "using" block.
welllifeisunfair