The description is bit on the longer side please bear with me. I would like to process and validate a huge XML file and log the node which triggered the validation error and continue with processing the next node. A simplified version of the XML file is shown below.
What I would like to perform is on encountering any validation error processing node 'A' or its children (both XMLException and XmlSchemaValidationException) I would like to stop processing current node log the error and XML for node 'A' and move on to the next node 'A'.
<Root>
<A id="A1">
<B Name="B1">
<C>
<D Name="ID" >
<E>Test Text 1</E>
</D>
<D Name="text" >
<E>Test Text 1</E>
</D>
</C>
</B>
</A>
<A id="A2">
<B Name="B2">
<C>
<D Name="id" >
<E>Test Text 3</E>
</D>
<D Name="tab1_id" >
<E>Test Text 3</E>
</D>
<D Name="text" >
<E>Test Text 3</E>
</D>
</C>
</B>
</Root>
I am currently able to recover from the XmlSchemaValidationException by using a ValidationEventHandler with XMLReader which throws a Exception that I handle in the XML Processing code. However for some cases XMLException is being triggered which leads to termination of the process.
The following snippets of the code illustrate the current structure I am using; it is messy and code improvement suggestions are also welcome.
// Setting up the XMLReader
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Auto;
settings.IgnoreWhitespace = true;
settings.CloseInput = true;
settings.IgnoreComments = true;
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(null, "schema.xsd");
settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
XmlReader reader = XmlReader.Create("Sample.xml", settings);
// Processing XML
while (reader.Read())
if (reader.NodeType == XmlNodeType.Element)
if (reader.Name.Equals("A"))
processA(reader.ReadSubtree());
reader.Close();
// Process Node A
private static void processA(XmlReader A){
try{
// Perform some book-keeping
// Process Node B by calling processB(A.ReadSubTree())
}
catch (InvalidOperationException ex){
}
catch (XmlException xmlEx){
}
catch (ImportException impEx){
}
finally{ if (A != null) A.Close(); }
}
// All the lower level process node functions propagate the exception to caller.
private static void processB(XmlReader B){
try{
// Book-keeping and call processC
}
catch (Exception ex){
throw ex;
}
finally{ if (B != null) B.Close();}
}
// Validation event handler
private static void ValidationCallBack(object sender, ValidationEventArgs e){
String msg = "Validation Error: " + e.Message +" at line " + e.Exception.LineNumber+
" position number "+e.Exception.LinePosition;
throw new ImportException(msg);
}
When a XMLSchemaValidationException is encountered the finally block will invoke close() and the original XMLReader is being positioned on the EndElement of the subtree and hence the finally block in processA will lead to processing of the next node A.
However when a XMlException is encountered invoking the close method is not positioning the original reader on the EndElement node of the subtree and an InvalidOperationException is being throw.
I tried to use methods like skip, ReadToXYZ() methods but these are invariably leading to XMLExcpetion of InvalidOperationException when invoked on any node that triggered an exception.
The following is a excerpt from MSDN regarding the ReadSubTree method.
When the new XmlReader has been closed, the original XmlReader will be positioned on the EndElement node of the sub-tree. Thus, if you called the ReadSubtree method on the start tag of the book element, after the sub-tree has been read and the new XmlReader has been closed, the original XmlReader is positioned on the end tag of the book element.
Note: I cannot use .Net 3.5 for this, however .Net 3.5 suggestions are welcome.