views:

75

answers:

3

I have an XML file that I'm trying to parse with Linq-to-XML. One of the nodes contains a bit of HTML, that I cannot retrieve.

The XML resembles:

<?xml version="1.0" encoding="ISO-8859-1"?>
<root>
<image><img src="/Images/m1cznk4a6fh7.jpg"  /></image>
<contentType>Banner</contentType>
</root>

The code is:

XDocument document = XDocument.Parse(content.XML);
XElement imageElement = document.Descendants("image").SingleOrDefault();
image = imageElement.Value; // Doesn't get the content, while if I specify .Descendants("contentType") it works

Any ideas?

A: 

If you're going to be storing HTML inside the XML elements it should be inside a <![CDATA[]]> comment so that LINQ2XML knows not to treat it as additional XML markup.

<image><![CDATA[<img src="Images/abc.jpg />]]></image>

If memory serves, you shouldn't have to do anything special to extract the value without the CDATA comment wrapping it, but you may need to call a property other than Value. I don't quite recall.

Nathan Taylor
Tbh, it's not really correct. It is to some extent, but I'm trying to do it with the xml as is, therefore the other answers were more accurate than yours. And also before yours ;) But thank you anyway.
Dante
A: 

That is because there is no Value nested under Image only another element (img). You would need to do something like:

XElement imgElement = document.Descendants("image").SingleOrDefault().FirstNode;

Then access the Value property to get src. Otherwise, if you are looking for the img tag as plain text you would need to save it in your XML doc as a CDATA section e..g

<image><![CDATA[<img src="/Images/m1cznk4a6fh7.jpg" />]]></image>
James
Thx for the answer, was also very useful. Can only approve one, but I also upvoted yours.
Dante
@Dante: no problem, however, I left a comment on the answer you accepted. You said you required the full HTML content inside the `image` tag which would require you to use a CDATA section. Therefore, I would have thought mine/@Nathan's answer would have been the correct one.
James
My bad, I didn't explain it properly. But he was first and he did answer my problem.
Dante
+1  A: 

.Value means any text within a tag and any child tags, but you don't have any. When you parsed it, <img/> was viewed as an XML tag, not specific for HTML (Linq doesn't know the difference). For example, if you had your XML written as:

<image>
    <img>/Images/m1cznk4a6fh7.jpg
    </img>
</image>

Then your code would work.

You'll have to go further in your decendents to the <img/> tag and then get the .Value of attribute src to retrieve the text you need.

Otaku
The question doesn't actually say they were looking for the `src` section of the tag, they just said the HTML within. A CDATA section would be required.
James
Why is this the accepted answer?
Nathan Taylor
Wow, lots of anger here. James and Nathan, you guys gonna be alright? He didn't say that he had the ability to change the format of his XML, only that he wanted to read the content (`.Value`). In order to use Linq to XML like he is using it, he needs to get the `src` attribute's value.
Otaku
@Otaku..no anger, more confusion :)
James
@James: The very last line of his code is commented "\\Doesn't get the content, while if I specify .Descendants("contentType") it works" which indicates that he is looking for the value of the `img` tag. In this case, there is only one value, that of the attribute `src`.
Otaku
@Otaku, he has an `image` tag and an `img` tag. I thought by his question he was expecting the full `<img src../>` data to be the `.value` of `image`. He didn't explicitly state he was looking for the `src`.
James
@James: looks like we interpretted this differently. I've dealt with HTML nested inside of XML before and never was I looking to get the HTML source - only the values of the HTML source. That's how I read his need too, especially given the example of `contentType` and the words value and content he used. But even in the case of looking for the HTML source, I would have recommended he just query `.SingleOrDefault` of `<image/>` and turn that into a `.ToString`.
Otaku
Guys, sorry for all the inconvenient! Thank you for all your comments, all of you helped but I can't approve all. I chose the one that answered my question and that came first.
Dante