views:

127

answers:

3

I've been given some "XML" files that don't quite have a proper schema (I think that's the problem) and the medical device that generates them cannot be changed to generate easy-to-parse XML. (With such a tantalizingly small modification (extra wrapping Images tags around the Image entries) it would be trivial to read these files---isn't that what XML is about?)

Basically I'm stuck here. The XML looks like this:

<Series>
   <Metadata1>foo</Metadata1>
   <Metadata2>bar</Metadata2>
   ...
   <Image>...</Image>
   <Image>...</Image>
   ...
</Series>

(there can be any number of images but the possible Metadata tags are all known). My code looks like this:

public class Image { ... }

public class Series : List<Image>
{
 public Series() { }
 public string Metadata1;
 public string Metadata2;
 ...
}

When I run this like so:

      XmlSerializer xs = new XmlSerializer(typeof(Series));
      StreamReader sr = new StreamReader(path);
      Series series = (Series)xs.Deserialize(sr);
      sr.Close();

the List of Image objects reads properly into the series object but no Metadata1/2/etc fields are read (in fact, browsing the object in the debugger shows all of the metadata fields inside of a "Raw View" sort of field).

When I change the code:

public class Series    // // removed this : List<Image>
{
 public Series() { }
 public string Metadata1;
 public string Metadata2;
 ...
}

and run the reader on the file, I get a series object with Metadata1/2/etc. filled in perfectly but no Image data being read (obviously).

How do I parse both Metadata1/2/etc. and the series of Images with the least amount of painful ad hoc code?

Do I have to write some custom (painful? easy?) ReadXML method to implement IXMLSeralizable?

I don't care too much how the objects are laid out since my software that consumes these C# classes is totally flexible:

List<Image> Images;
for the images would be fine, or perhaps the metadata is wrapped in some object, whatever...

+2  A: 

Why are you trying to use an XML serializer to do this? Serialization is generally about being able to save the "state" of an object in some well-known format (text or binary) so that it can be recreated at a later point in time. That doesn't sound like what you are trying to do here. The problem here is that the XML data doesn't really match your object hierarchy.

You have a hardware device that somehow generates XML data that you want to consume. To me, this would be easiest using a simple XmlDocument or XmlReader class rather than trying to go through the serializer.

You could probably do this with code like this:

public class Image { }

public class Series
{
   public string Metadata1;
   public string Metadata2;
   public List<Image> Images = new List<Image>();

   public void Load(string xml)
   {
      XmlDocument doc = new XmlDocument();
      doc.Load(xml);

      XmlNodeList images = doc.SelectNodes("Image");
      foreach (XmlNode image in images)
      {
         Images.Add(new Image(image.InnerText));
      }

      Metadata1 = GetMetadataValue(doc, "Metadata1");
      Metadata2 = GetMetadataValue(doc, "Metadata2");
   }

   private string GetMetadataValue(XmlDocument document, string nodeName)
   {
      string value = String.Empty;
      XmlNode metadataNode = document.SelectSingleNode(nodeName);
      if (metadataNode != null)
      {
         value = metaDataNode.InnerText;
      }

      return value;
   }
}

*This is untested/unverified code, but it should get the idea across.

Scott Dorman
Wow that would turn 160 lines of code into 1600 lines of code. I guess the problem is that XmlSerializer gets me rather close, just not all the way there. Is what you posted compatible with XMLSerializer? This is part of a larger set of 16 classes that otherwise parse correctly with XmlSerializer.
Jared Updike
+2  A: 

Your classes are missing the attributes that allow XML serialization to work. I believe the following should suffice.

[XmlElement]
public class Image { ... }

[XmlRoot(ElementName="Series")]
public class Series
{
        public Series() { }

        [XmlElement]
        public string Metadata1;

        [XmlElement]
        public string Metadata2;

        [XmlElement(ElementName="Image")]
        public Image[] Images;
}

I'm not sure if you can use a generic type in place of the image array, but the above referenced link should give you more information on how to apply the serialization attributes for you specific situation.

EDIT: Another option is to hand-craft and XML schema that will validate the documents produced by the application, then use XSD.exe to generate the object model. The resulting classes will demonstrate how you should tweek your object model to work with the serializer.

Steve Guidi
The only part I needed was: [XmlElement(ElementName="Image")] public Image[] Images; and it worked beautifully. The [XmlElement] attributes on public class Image { ... } were unnecessary (and indeed do not compile: "Attribute 'XmlElement' is not valid on this declaration type. It is valid on 'property, indexer, field, param, return' declarations only."
Jared Updike
A: 

I think Steve's answer should work. I just want to add that you can only read a finite number of Metadata elements with this technique, because they don't have a constant name. What you could do is read them into a collection of XmlElements that you can parse later :

[XmlRoot(ElementName="Series")]
public class Series
{
    public Series() { }

    [XmlAnyElement]
    XmlElement[] UnknownElements;

    private string[] _metadata;
    [XmlIgnore]
    public string[] Metadata
    {
        get
        {
            if (_metadata == null && UnknownElements != null)
            {
                _metadata = UnknownElements
                            .Where(e => e.Name.StartsWith("Metadata")
                            .Select(e => e.InnerText)
                            .ToArray();
            }
            return _metadata;
        }
    }

    [XmlElement(ElementName="Image")]
    public Image[] Images;
}
Thomas Levesque
I over-simplified my XML example and introduced an non-requirement with Metadata*. I should have just kept the original classes/fields that I was trying to parse! Sorry about that. I meant by Metadata1/2/etc. just a collection of fields and objects that I am able to parse correctly (Foo, Bar, Bat, Baz, you name it). There was nothing to the numbers, the gist of the question (sorry if I didn't communicate it) was how to parse, within a Series object, those "normal" fields/objects (what I called Metadata1, Metadata2) AND parse a sequence of Image objects into a List<Image>. Clever solution, tho!
Jared Updike