views:

531

answers:

4

I'm stuck with using a web service I have no control over and am trying to parse the XML returned by that service into a standard object.

A portion of the XML structure looks like this

<NO>
   <L>Some text here </L>
   <L>Some additional text here </L>
   <L>Still more text here </L>
</NO>

In the end, I want to end up with one String property that will look like "Some text here Some additional text here Still more text here "

What I have for an initial pass is what follows. I think I'm on the right track, but not quite there yet:

XElement source = \\Output from the Webservice
List<IndexEntry> result;

result = (from indexentry in source.Elements(entryLevel)
    select new IndexEntry()
    {
        EtiologyCode = indexentry.Element("IE") == null ? null : indexentry.Element("IE").Value,
        //some code to set other properties in this object
        Note = (from l in indexentry.Elements("NO").Descendants
                select l.value)  //This is where I stop
                               // and don't know where to go
    }

I know that I could add a ToList() operator at the end of that query to return the collection. Is there an opertaor or technique that would allow me to inline the concatentation of that collection to a single string?

Feel free to ask for more info if this isn't clear.

Thanks.

A: 

I don't have experience with it myself, but it strikes me that LINQ to XML could vastly simplify your code. Do a select of XML document, then loop through it and use a StringBuilder to append the L element to some string.

JoshJordan
+6  A: 

LINQ to XML is indeed the way here:

var textArray = topElement.Elements("L")
                          .Select(x => x.Value)
                          .ToArray();

var text = string.Join(" ", textArray);

EDIT: Based on the comment, it looks like you just need a single-expression way of representing this. That's easy, if somewhat ugly:

result = (from indexentry in source.Elements(entryLevel)
    select new IndexEntry
    {
        EtiologyCode = indexentry.Element("IE") == null 
                           ? null 
                           : indexentry.Element("IE").Value,
        //some code to set other properties in this object
        Note = string.Join(" ", indexentry.Elements("NO")
                                          .Descendants()
                                          .Select(x => x.Value)
                                          .ToArray())
    };

Another alternative is to extract it into a separate extension method (it has to be in a top-level static class):

public static string ConcatenateTextNodes(
    this IEnumerable<XElement> elements)
{
    string[] values = elements.Select(x => x.Value).ToArray();
    // You could parameterise the delimiter here if you wanted
    return string.Join(" ", values);
}

then change your code to:

result = (from indexentry in source.Elements(entryLevel)
    select new IndexEntry
    {
        EtiologyCode = indexentry.Element("IE") == null 
                           ? null 
                           : indexentry.Element("IE").Value,
        //some code to set other properties in this object
        Note = indexentry.Elements("NO")
                         .Descendants()
                         .ConcatenateTextNodes()
    }

EDIT: A note about efficiency

Other answers have suggested using StringBuilder in the name of efficiency. I would check for evidence of this being the right way to go before using it. If you think about it, StringBuilder and ToArray do similar things - they create a buffer bigger than they need to, add data to it, resize it when necessary, and come out with a result at the end. The hope is that you won't need to resize too often.

The difference between StringBuilder and ToArray here is what's being buffered - in StringBuilder it's the entire contents of the string you've built up so far. With ToArray it's just references. In other words, resizing the internal buffer used for ToArray is likely to be cheaper than resizing the one for StringBuilder, particularly if the individual strings are long.

After doing the buffering in ToArray, string.Join is hugely efficient: it can look through all the strings to start with, work out exactly how much space to allocate, and then concatenate it without ever having to copy the actual character data.

This is in sharp contrast to a previous answer I've given - but unfortunately I don't think I ever wrote up the benchmark.

I certainly wouldn't expect ToArray to be significantly slower, and I think it makes the code simpler here - no need to use side-effects etc, aggregation etc.

Jon Skeet
Jon, maybe my example isn't clear. I'm populating a more complex opject with a LINQ TO XML query. The example only represents a portion of the XML involved. So in my bottom example NOTE represents a string property in my object, and I would like, if possible to populate it...
Steve Brouillard
...inline with the rest of the LINQ query I have in place. Am I being clear? I see what you're advocating and understand it, I'm just not sure how to apply it in my situation. I could add more code for context if that would help.
Steve Brouillard
Editing answer...
Jon Skeet
Jon. Thanks for the expanded answer. Really glad I went to lunch before I started banging on this.
Steve Brouillard
Jon. I don't have enough rep to edit your answer. For future clarity, please fix a small typo. In the extension method definition "string values = " should be "string[] values = "Thanks.
Steve Brouillard
Fixed, thanks Steve.
Jon Skeet
Just glad to do my part in advancing the cause that is Jon Skeet :)
Steve Brouillard
+1  A: 

The other option is to use Aggregate()

var q = topelement.Elements("L")
                  .Select(x => x.Value)
                  .Aggregate(new StringBuilder(), 
                             (sb, x) => return sb.Append(x).Append(" "),
                             sb => sb.ToString().Trim());

edit: The first lambda in Aggregate is the accumulator. This is taking all of your values and creating one value from them. In this case, it is creating a StringBuilder with your desired text. The second lambda is the result selector. This allows you to translate your accumulated value into the result you want. In this case, changing the StringBuilder to a String.

dustyburwell
This may be my answer. I will try it, thank you.
Steve Brouillard
Oh, yeah. You'd have to replace 'var q =' with 'Note = ' in your code, btw.
dustyburwell
See my expanded answer - I don't think using a StringBuilder is likely to improve the performance over using string.Join and ToArray together. It may even make it worse. It certainly makes it more complicated IMO.
Jon Skeet
A: 

I like LINQ as much as the next guy, but you're reinventing the wheel here. The XmlElement.InnerText property does exactly what's being asked for.

Try this:

using System.Xml;

class Program
{
    static void Main(string[] args)
    {
        XmlDocument d = new XmlDocument();
        string xml =
            @"<NO>
  <L>Some text here </L>
  <L>Some additional text here </L>
  <L>Still more text here </L>
</NO>";
        d.LoadXml(xml);
        Console.WriteLine(d.DocumentElement.InnerText);
        Console.ReadLine();
    }
}
Robert Rossney
That would be all well and good, but I'm parsing XML with LINQ to build a business object and in a number of cases I need to build collections inside those objects from similar constructs in the XML, so rather than extra processing later I want to keep this inside the existing LINQ query.
Steve Brouillard