views:

39

answers:

0

In .NET and C#, I have some XSD generated class with root and type attributes set to

[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://example.com")]
[System.Xml.Serialization.XmlRootAttribute("myRootElement", Namespace="http://example.com", IsNullable=false)]

Furthermore, I have files containing XML data from different sources. Some have a proper xmlns attribute:

<?xml version="1.0" encoding="utf-8"?>
<myRootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://example.com" xmlns:xsd="http://www.w3.org/2001/XMLSchema"&gt;

Some have the xmlns= attribute missing. I have to read both types, i.e. in case xmlns attribute is missing, I need to assume a namespace and try to deserialize. I know that different namespaces actually mean different types for the serializer, and the same XML content in the two files with and without xmlns attribute are just coincidence for the serializer. But I still want to deserialize both cases to the same XSD generated class.

This topic has been discussed elsewhere, e.g. here.

The problem now comes with some nasty side conditions:

  1. The XML files might contain large amounts of data, so performance is important.
  2. The XML data is read from a ZipInputStream (coming from ICSharpCode.SharpZipLib.Zip)

I had the following ideas so far:

  • Read the stream to XmlTextReader, read the root node only and inspect the attributes. If the xmlns attribute is missing, fix it and then call XmlSerializer.Deserialize(Stream myStream). However, a ZipInputStream cannot seek. That means I cannot reset or modify the stream after I inspected the root node xmlns attribute.
  • Read the stream to XmlTextReader, fix the root node xmlns attribute and call XmlSerializer.Deserialize(XmlTextReader myXmlTextReader). However, I fear that this is not the best option in terms of performance, as I have to read the entire XML document first to XmlTextReader and then pass this one to XmlSerializer, I guess deserializing the stream directly would be better.
  • Wrap the ZipInputStream by some other stream, e.g. a MemoryStream. I don't know how to do that and I don't know about performance of such operations.
  • Using something like XmlAttributeOverrides class for the serializer. I would have to set this based on the information found in the root node, and again I have the problem that a ZipInputStream cannot seek.

The last option would look something like this:

XmlRootAttribute newRoot = new XmlRootAttribute();
newRoot.ElementName = "myRootElement";
newRoot.Namespace = "http://example.com/";

XmlAttributes myAttributes = new XmlAttributes();
myAttributes.XmlRoot = newRoot;

XmlAttributeOverrides myOverrides = new XmlAttributeOverrides();
myOverrides.Add(typeof(myRootType), myAttributes);

ser = new XmlSerializer(typeof(myRootType), myOverrides);

So my question is: What would be the smartest way (performancewise) to preprocess the ZipInputStream and fix the xmlns attribute before entering XmlSerializer.Deserialize() method?