views:

668

answers:

4

So say I have this XML file:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Root>
  <Category Name="Tasties">
    <Category Name="Pasta">
      <Category Name="Chicken">
        <Recipe Name="Chicken and Shrimp Scampi" />
        <Recipe Name="Chicken Fettucini Alfredo" />
      </Category>
      <Category Name="Beef">
        <Recipe Name="Spaghetti and Meatballs" />
        <Recipe Name="Lasagna" />
      </Category>
      <Category Name="Pork">
        <Recipe Name="Lasagna" />
      </Category>
      <Category Name="Seafood">
        <Recipe Name="Chicken and Shrimp Scampi" />
      </Category>
    </Category>
  </Category>
</Root>

And I want to return the names of all the recipes in Tasties\Pasta\Chicken, how would I do this?

What I have currently is:

var q = from chk in
            (from c in doc.Descendants("Category")
             where c.Attribute("Name").Value == "Chicken"
             select c)
        select from r in chk.Descendants("Recipe")
               select r.Attribute("Name").Value;

foreach (var recipes in q)
{
    foreach (var recipe in recipes)
    {
        Console.WriteLine("Recipe name = {0}", recipe);
    }
}

Which kinda works, although it doesn't check the path, only for the first category named Chicken. I could dig through each element in the path recursively, but it seems like there probably is a better solution I'm missing. Also my current query returns IEnumerable<IEnumerable<String>> when all I want is just an IEnumerable<String>.

Basically I can make it work but it looks messy and I'd like to see any LINQ suggestions or techniques to do better querying.

+2  A: 

Personally, I'd use XmlDocument and the familiar SelectNodes:

foreach(XmlElement el in doc.DocumentElement.SelectNodes(
   "Category[@Name='Tasties']/Category[@Name='Pasta']/Category[@Name='Chicken']/Recipe")) {
    Console.WriteLine(el.GetAttribute("Name"));
}

For LINQ-to-XML, I'd guess (untested) something like:

var q = from c1 in doc.Root.Elements("Category")
        where c1.Attribute("Name").Value == "Tasties"
        from c2 in c1.Elements("Category")
        where c2.Attribute("Name").Value == "Pasta"
        from c3 in c2.Elements("Category")
        where c3.Attribute("Name").Value == "Chicken"
        from recipe in c3.Elements("Recipe")
        select recipe.Attribute("Name").Value;
foreach (string name in q) {
    Console.WriteLine(name);
}


Edit: if you want the category selection to be more flexible:

    string[] categories = { "Tasties", "Pasta", "Chicken" };
    XDocument doc = XDocument.Parse(xml);
    IEnumerable<XElement> query = doc.Elements();
    foreach (string category in categories) {
        string tmp = category;
        query = query.Elements("Category")
            .Where(c => c.Attribute("Name").Value == tmp);
    }
    foreach (string name in query.Descendants("Recipe")
        .Select(r => r.Attribute("Name").Value)) {
        Console.WriteLine(name);
    }

This should now work for any number of levels, selecting all recipes at the chosen level or below.


Edit for discussion (comments) on why Where has a local tmp variable:

This might get a bit complex, but I'm trying to do the question justice ;-p

Basically, the foreach (with the iterator lvalue "captured") looks like:

class SomeWrapper {
    public string category;
    public bool AnonMethod(XElement c) {
        return c.Attribute("Name").Value == category;
    }
}
...
SomeWrapper wrapper = new SomeWrapper(); // note only 1 of these
using(var iter = categories.GetEnumerator()) {
    while(iter.MoveNext()) {
        wrapper.category = iter.Current;
        query = query.Elements("Category")
             .Where(wrapper.AnonMethod);
    }
}

It might not be obvious, but since Where isn't evaluated immediately, the value of category (via the predicate AnonMethod) isn't checked until much later. This is an unfortunate consequence of the precise details of the C# spec. Introducing tmp (scoped inside the foreach) means that the capture happens per iteration:

class SecondWrapper {
    public string tmp;
    public bool AnonMethod(XElement c) {
        return c.Attribute("Name").Value == tmp;
    }
}
...
string category;
using(var iter = categories.GetEnumerator()) {
    while(iter.MoveNext()) {
        category = iter.Current;
        SecondWrapper wrapper = new SecondWrapper(); // note 1 per iteration
        wrapper.tmp = category;
        query = query.Elements("Category")
             .Where(wrapper.AnonMethod);
    }
}

And hence it doesn't matter whether we evaluate now or later. Complex and messy. You can see why I favor a change to the specification!!!

Marc Gravell
Cool, didn't know about DocumentElement.SelectNodes() (I'm fairly new to working with xml programatically)
Davy8
(note I fixed the LINQ-to-XML)
Marc Gravell
Oh, I see, in your case doc is an XmlDocument not XDocument. Hmm...
Davy8
@Davy8 - the first example is XmlDocument, the second example shows XDocument
Marc Gravell
Right, the 2nd answer is longer, but it'll be easier to refactor into something reusable in a more strongly typed manner.
Davy8
Good stuff, that looks cleaner than whatever I would've come up with.
Davy8
Is there a reason why you assign category to tmp? It looks unnecessary to me.
Davy8
Yes; the LINQ "Where" method uses deferred execution. The "foreach" construct captures the lvalue; this means that without the "tmp" scoped **inside** the "foreach", you capture the same *variable* for every level - meaning typically that the last value ("Chicken") is used for each level. Try removing it; you'll see you get no matches. The whole "foreach" / "lvalue" / captured-variable issue is a known gotcha. I've spoken to Edic+Mads in the past to see if it could be changed (it requires a spec change) - only time will tell.
Marc Gravell
Jon's answer here gives a list of SO questions on this very subject - it is a common one ;-phttp://stackoverflow.com/questions/295593/linq-query-built-in-foreach-loop-always-takes-parameter-value-from-last-iteration
Marc Gravell
Yeah, I've read about it a while ago, but forgotten about it since I haven't had it apply personally. So when you pass an argument to a lambda expression (what about anonymous methods?) does it actually pass it by ref then? (Here I'm using the read definition of pass by ref and not just referring to reference types)
Davy8
I'll add an example to show what it does... the issue comes about because of a reference-type wrapper class passed by value... So no: is isn't "pass by ref".
Marc Gravell
+1  A: 

Here's code that is similar to Marc's 2nd example, but it's tested and verified.

var q = from t in doc.Root.Elements("Category")
        where t.Attribute("Name").Value == "Tasties"
        from p in t.Elements("Category")
        where p.Attribute("Name").Value == "Pasta"
        from c in p.Elements("Category")
        where c.Attribute("Name").Value == "Chicken"
        from r in c.Elements("Recipe")
        select r.Attribute("Name").Value;

foreach (string recipe in q)
{
    Console.WriteLine("Recipe name = {0}", recipe);
}

In general, I'd say you only want a single select statement in your LINQ queries. You were getting the IEnumerable<IEnumerable<String>> because of your nested select statements.

Dennis Palmer
Descendants is a bit risky - you could be getting different Category to what you expect...
Marc Gravell
Yeah, I realized it was because of the nested selects, but couldn't figure out how to nest join properly statements instead. This is a very modular which is nice.
Davy8
Good point Marc, I've been debating whether to force unique names for categories because of this, or just ensure unique path, but it looks like I can stick with the current plan of unique paths since I can easily either iteratively or recursively drill down into a path.
Davy8
+1  A: 

If you add a using statement for System.Xml.XPath, that will add an XPathSelectElements() extension method to your XDocument. That will let you select nodes with an XPath statement if you're more comfortable with that.

Otherwise, you can flatten out your IEnumerable<IEnumerable<String>> into just an IEnumerable<string> with SelectMany:

IEnumerable<IEnumerable<String>> foo = myLinqResults;
IEnumerable<string> bar = foo.SelectMany(x => x);
Joel Mueller
+1  A: 

A little bit late, but extension methods can really help to clean up messy looking LINQ to XML queries. For your scenario you could work with code like this:

var query = xml.Root
               .Category("Tasties")
               .Category("Pasta")
               .Category("Chicken")
               .Recipes();

... using some techniques I show in From LINQ To XPath And Back Again

OdeToCode