views:

188

answers:

1

I am trying to figure out a way to parse an xml tag where content is passed in with CDATA tags for some input, but not for all.

For example, the following is sample content I would receive for data which contains CDATA tags. But there is some other scenarios where the CDATA tags are ommited.

<Data><![CDATA[ <h1>CHAPTER 2<br/> EDUCATION</h1>
                <P>  Analysis paragraph  </P> ]]></Data>

Is there an elegant way to somehow detect that, and implement ReadXml method that can parse both types of input (with or without CDATA)? So far my ReadXml() implementation is as follows, but am getting errors parsing when CDATA tag is omitted.

    public void ReadXml(XmlReader reader)
    {
        bool isEmpty = reader.IsEmptyElement;
        reader.ReadStartElement();
        if (isEmpty)
        {
            _data = string.Empty;
        }
        else
        {                
            switch (reader.MoveToContent())
            {
                case XmlNodeType.Text:
                case XmlNodeType.CDATA:
                    _data = reader.ReadContentAsString();
                    break;
                default:
                    _data = string.Empty;
                    break;
            }
            reader.ReadEndElement();
        }                         
    }
+1  A: 

The code below is tested on the following samples:

<Data><h1>CHAPTER 2<br/> EDUCATION</h1><P>  Analysis paragraph  </P></Data>
<Data>test<h1>CHAPTER 2<br/> EDUCATION</h1><P>  Analysis paragraph  </P></Data>
<Data><![CDATA[ <h1>CHAPTER 2<br/> EDUCATION</h1><P>  Analysis paragraph  </P> ]]></Data>
<Data></Data>

I use an XPathNavigator instead as it allows backtracking.

public void ReadXml(XmlReader reader)
{
    XmlDocument doc = new XmlDocument {PreserveWhitespace = false};
    doc.Load(reader);

    var navigator = doc.CreateNavigator();
    navigator.MoveToChild(XPathNodeType.Element);
    _data = navigator.InnerXml.Trim().StartsWith("&lt;") ? navigator.Value : navigator.InnerXml;
}
Mikael Svenson
That does do the trick. I ended up using Xnode instead of XmlDocument, and then its createNavigator method to get an XPathNavigator to use to retreive the innerxml.
jvtech
Using an XmlNode is probably better, and glad it worked. Feel free to mark the answer as accepted as well :)
Mikael Svenson
Using XmlDocument.Load and then getting XmlNode does not work for me. The sample xml in the example I gave is just one of the nodes in the actual input data (actual input has a quite complex xml structure). So If I try to do XmlDocument.Load when parsing this particular node, I am getting errors, and cannot read further.
jvtech