views:

481

answers:

2

My current task involves writing a class library for processing HL7 CDA files.
These HL7 CDA files are XML files with a defined XML schema, so I used xsd.exe to generate .NET classes for XML serialization and deserialization.

The XML Schema contains various types which contain the mixed="true" attribute, specifying that an XML node of this type may contain normal text mixed with other XML nodes.
The relevant part of the XML schema for one of these types looks like this:

<xs:complexType name="StrucDoc.Paragraph" mixed="true">
    <xs:sequence>
        <xs:element name="caption" type="StrucDoc.Caption" minOccurs="0"/>
        <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element name="br" type="StrucDoc.Br"/>
            <xs:element name="sub" type="StrucDoc.Sub"/>
            <xs:element name="sup" type="StrucDoc.Sup"/>
            <!-- ...other possible nodes... -->
        </xs:choice>
    </xs:sequence>
    <xs:attribute name="ID" type="xs:ID"/>
    <!-- ...other attributes... -->
</xs:complexType>

The generated code for this type looks like this:

/// <remarks/>
[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(TypeName="StrucDoc.Paragraph", Namespace="urn:hl7-org:v3")]
public partial class StrucDocParagraph {

    private StrucDocCaption captionField;

    private object[] itemsField;

    private string[] textField;

    private string idField;

    // ...fields for other attributes...

    /// <remarks/>
    public StrucDocCaption caption {
        get {
            return this.captionField;
        }
        set {
            this.captionField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("br", typeof(StrucDocBr))]
    [System.Xml.Serialization.XmlElementAttribute("sub", typeof(StrucDocSub))]
    [System.Xml.Serialization.XmlElementAttribute("sup", typeof(StrucDocSup))]
    // ...other possible nodes...
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlTextAttribute()]
    public string[] Text {
        get {
            return this.textField;
        }
        set {
            this.textField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlAttributeAttribute(DataType="ID")]
    public string ID {
        get {
            return this.idField;
        }
        set {
            this.idField = value;
        }
    }

    // ...properties for other attributes...
}

If I deserialize an XML element where the paragraph node looks like this:

<paragraph>first line<br /><br />third line</paragraph>

The result is that the item and text arrays are read like this:

itemsField = new object[]
{
    new StrucDocBr(),
    new StrucDocBr(),
};
textField = new string[]
{
    "first line",
    "third line",
};

From this there is no possible way to determine the exact order of the text and the other nodes.
If I serialize this again, the result looks exactly like this:

<paragraph>
    <br />
    <br />first linethird line
</paragraph>

The default serializer just serializes the items first and then the text.

I tried implementing IXmlSerializable on the StrucDocParagraph class so that I could control the deserialization and serialization of the content, but it's rather complex since there are so many classes involved and I didn't come to a solution yet because I don't know if the effort pays off.

Is there some kind of easy workaround to this problem, or is it even possible by doing custom serialization via IXmlSerializable? Or should I just use XmlDocument or XmlReader/XmlWriter to process these documents?

A: 

What about

itemsField = new object[] 
{ 
    "first line", 
    new StrucDocBr(), 
    new StrucDocBr(), 
    "third line", 
};

?

Guillaume
This results in an InvalidOperationException when I try to serialize the object (because of the strings inside the itemsField, the itemsField array may only contain Objects of those types which are specified by the [XmlElement] attributes for the public property 'Items').
Stefan
You may find help here :http://msdn.microsoft.com/en-us/library/kz8z99ds.aspxAny schema validation warning ?
Guillaume
I already found that page during my searches, it's about another problem. The schema of my xml documents is correct, I validate it both before deserialization and after serialization. But I did find the answer to my problem just now, your suggestion with the itemsField array was already close, it just needed some further modifications in the generated code. I will post it in a few minutes.
Stefan
+4  A: 

To solve this problem I had to modify the generated classes:

  1. Move the XmlTextAttribute from the Text property to the Items property and add the parameter Type = typeof(string)
  2. Remove the Text property
  3. Remove the textField field

As a result the generated code (modified) looks like this:

/// <remarks/>
[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(TypeName="StrucDoc.Paragraph", Namespace="urn:hl7-org:v3")]
public partial class StrucDocParagraph {

    private StrucDocCaption captionField;

    private object[] itemsField;

    private string idField;

    // ...fields for other attributes...

    /// <remarks/>
    public StrucDocCaption caption {
        get {
            return this.captionField;
        }
        set {
            this.captionField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("br", typeof(StrucDocBr))]
    [System.Xml.Serialization.XmlElementAttribute("sub", typeof(StrucDocSub))]
    [System.Xml.Serialization.XmlElementAttribute("sup", typeof(StrucDocSup))]
    // ...other possible nodes...
    [System.Xml.Serialization.XmlTextAttribute(typeof(string))]
    public object[] Items {
        get {
            return this.itemsField;
        }
        set {
            this.itemsField = value;
        }
    }

    /// <remarks/>
    [System.Xml.Serialization.XmlAttributeAttribute(DataType="ID")]
    public string ID {
        get {
            return this.idField;
        }
        set {
            this.idField = value;
        }
    }

    // ...properties for other attributes...
}

Now if I deserialize an XML element where the paragraph node looks like this:

<paragraph>first line<br /><br />third line</paragraph>

The result is that the item array is read like this:

itemsField = new object[]
{
    "first line",
    new StrucDocBr(),
    new StrucDocBr(),
    "third line",
};

This is exactly what I need, the order of the items and their content is correct.
And if I serialize this again, the result is again correct:

<paragraph>first line<br /><br />third line</paragraph>

What pointed me in the right direction was the answer by Guillaume, I also thought that it must be possible like this. And then there was this in the MSDN documentation to XmlTextAttribute:

You can apply the XmlTextAttribute to a field or property that returns an array of strings. You can also apply the attribute to an array of type Object but you must set the Type property to string. In that case, any strings inserted into the array are serialized as XML text.

So the serialization and deserialization work correct now, but I don't know if there are any other side effects. Maybe it's not possible to generate a schema from these classes with xsd.exe anymore, but I don't need that anyway.

Stefan