views:

514

answers:

1

Hi there, I am trying to read an RSS feed using C# and Linq to XML. The feed is encoded in utf-8 (see http://pc03224.kr.hsnr.de/infosys/feed/) and reading it out generally works fine except for the description node because it is enclosed in a CDATA section.

For some reason I can't see the CDATA tag in the debugger after reading out the content of the "description" tag but I guess it must be there somewhere because only in this section the German Umlaute (äöü) and other special characters are not shown correctly. Instead they remain in the string utf-8 encoded like ü.

Can I somehow read them out correctly or at least decode them afterwards?

This is a sample of the RSS section giving me troubles:

<description><![CDATA[blabla bietet H&#246;rern meiner Vorlesungen &#8220;IAS&#8221;, &#8220;WEB&#8221; und &#8220;SWE&#8221; an, Lizenzen f&#252;r blabla [...]]]></description>

Here is my code which reads out and parses the RSS feed data:

RssItems = (from xElem in xml.Descendants("channel").Descendants("item")
                            select new RssItem
                                       {
                                           Content =  xElem.Descendants("description").FirstOrDefault().Value,
                                           ...
                                       }).ToList();

Thanks in advance!

+1  A: 

Your code is working as intended. A CDATA section means that the contents should not be interpreted, i.e. "&#246;" should not be treated as an HTML entity but just as a sequence of characters.

Contact the author of the RSS feed and tell him to fix it, either by removing the CDATA tags so the entities get interpreted, or by putting the intended characters directly into the HTML file.

Alternatively, have a look at HttpUtility.HtmlDecode to decode the CDATA contents a second time.

dtb
A better way to decode such CDATA would probably be to use `XmlReader` in fragment mode on the contents wrapped in a `StringReader` - this would remove the dependency on ASP.NET assemblies.
Pavel Minaev
This is not an issue for our project. Generally a good idea though.
SimonW