views:

43

answers:

1

I am having difficulty preserving certain nodes (in this case <b>) when parsing XML with LINQ to XML. I first grab a node with the following LINQ query...

IEnumerable<XElement> node = from el in _theData.Descendants("msDict") select el;

Which returns the following XML (as the first XElement)...

<msDict lexid="m_en_us0000002.001" type="core">
  <df>(preceding a numeral) <b>pound</b> or <b>pounds</b> (of money)
  <genpunc tag="df">.</genpunc></df>
</msDict>

I then collect the content with the following code...

StringBuilder output = new StringBuilder();
foreach (XElement elem in node)
{
    output.append(elem.Value);
}

Here's the breaking point. All of the XML nodes are stripped, but I want to preserve all instances of <b>. I am expecting to get the following as output...

(preceding a numeral) <b>pound</b> or <b>pounds</b> (of money).

Note: I know that this is a simple operation in XSLT, but I would like to know if there an easy way to do this using LINQ to XML.

+1  A: 

In the category of "it works but it's messy and I can't believe I have to resort to this":

StringBuilder output = new StringBuilder();  
foreach (XElement elem in node)  
{  
    output.append(string.Join("", elem.Nodes().Select(n => n.ToString()).ToArray()));  
} 

Personally, I think this cries out for an extension method on XElement...

UPDATE: If you want to exclude all element tags except <b> then you'll need to use a recursive method to return node values.

Here's your main method body:

StringBuilder output = new StringBuilder();
foreach (XElement elem in node)
{
    output.Append(stripTags(elem));
}

And here's stripTags:

private static string stripTags(XNode node)
{
    if (node is XElement && !((XElement)node).Name.ToString().Equals("b", StringComparison.InvariantCultureIgnoreCase))
    {
        return string.Join(string.Empty, ((XElement)node).Nodes().Select(n => stripTags(n)).ToArray());
    }
    else
    {
        return node.ToString();
    }
}

So the real answer is that no, there isn't an easy way to do this using LINQ to XML, but there's a way...

Jacob Proffitt
I get a syntax error: Argument type 'Systems.Collections.Generic.IEnumerable<string>' is not assignable to parameter type 'string[]'
Ryan Berger
Ah. If you're not using .Net 4, you'll need to add .ToArray(). I'll update to make this more universal.
Jacob Proffitt
Using your method I get "(preceding a numeral) <b>pound</b> or <b>pounds</b> (of money)<genpunc tag="df">.</genpunc>". I want to preserve <b> but not <genpunc>. Is there any way to achieve this behavior using your method?
Ryan Berger
Ah. That *is* more complex. You'd need to add a recursive call to a method that identifies if the node is an XElement and if so, if it's a <b> or not. Tougher, but not impossible.
Jacob Proffitt
Works like a charm for my situation. Thank you very much!
Ryan Berger