views:

4242

answers:

10

I am writing a LINQ provider to a hierarchal data source. I find it easiest to design my API by writing examples showing how I want to use it, and then coding to support those use cases.

One thing I am having trouble with is an easy/reusable/elegant way to express "deep query" or recursion in a LINQ statement. In other words, what is the best way to distinguish between:

from item in immediate-descendants-of-current-node where ... select item

versus:

from item in all-descendants-of-current-node where ... select item

(Edit: please note neither of those examples above necessarily reflect the structure of the query I want. I am interested in any good way to express recursion/depth)

Please note I am not asking how to implement such a provider, or how to write my IQueryable or IEnumerable in such a way that allows recursion. I am asking from the standpoint of a person writing the LINQ query and utilizing my provider - what is an intuitive way for them to express whether they want to recurse or not?

The data structure resembles a typical file system: a folder can contain a collection of subfolders, and a folder can also contain a collection of items. So myFolder.Folders represents all the folders who are immediate children of myFolder, and myFolder.Items contains all the items immediately within myFolder. Here's a basic example of a site hierachy, much like a filesystem with folders and pages:

(F)Products
    (F)Light Trucks
        (F)Z150
            (I)Pictures
            (I)Specs
            (I)Reviews
        (F)Z250
            (I)Pictures
            (I)Specs
            (I)Reviews
        (F)Z350
            (I)Pictures
            (I)Specs
            (I)Reviews
        (I)Splash Page
    (F)Heavy Trucks
    (F)Consumer Vehicles
    (I)Overview

If I write:

from item in lightTrucks.Items where item.Title == "Pictures" select item

What is the most intuitive way to express an intent that the query get all items underneath Light Trucks, or only the immediate ones? The least-intrusive, lowest-friction way to distinguish between the two intents?

My #1 goal is to be able to turn this LINQ provider over to other developers who have an average understanding of LINQ and allow them to write both recursive and list queries without giving them a tutorial on writing recursive lambdas. Given a usage that looks good, I can code the provider against that.

Additional clarification: (I am really sucking at communicating this!) - This LINQ provider is to an external system, it is not simply walking an object graph, nor in this specific case does a recursive expression actually translate into any kind of true recursive activity under the hood. Just need a way to distinguish between a "deep" query and a "shallow" one.

So, what do you think is the best way to express it? Or is there a standard way of expressing it that I've missed out on?

+12  A: 

Linq-toXml handles this fine, there is an XElement.Elements()/.Nodes() operation to get immediate children, and a XElement.Descendents()/DescendentNodes() operations to get all descendents. Can you mimic that as a standard?

To summarize Linq-to-Xml's behavior... The navigation functions each correspond to an axis type in XPath (http://www.w3schools.com/xpath/xpath%5Faxes.asp). If the navigation function selects Elements, the axis name is used. If the navigation function selects Nodes, the axis name is used with Node appended.

For instance, there are functions Descendants() and DescendantsNode() correspond to XPath's descendants axis, returning either an XElement or an XNode.

The exception case is not surprisingly the most used case, the children axis. In XPath, this is the axis used if no axis is specified. For this, the linq-to-xml navigation functions are not Children() and ChildrenNodes() but rather Elements() and Nodes().

XElement is a subtype of XNode. XNode's include things like HTML tags, but also HTML comments, cdata or text. XElements are a type of XNode, but refer specifically to HTML tags. XElements therefore have a tag name, and support the navigation functions.

Now its not as easy to chain navigations in Linq-to-XML as it is XPath. The problem is that navigation functions return collection objects, while the navigation functions are applied to non-collections. Consider the XPath expression which selects a table tag as an immediate child then any descendant table data tag. I think this would look like "./children::table/descendants::td" or "./table/descendants::td"

Using IEnumerable<>::SelectMany() allows one to call the navigation functions on a collection. The equivalent to the above looks something like .Elements("table").SelectMany(T => T.Descendants("td"))

Frank Schwieterman
+3  A: 

I would just implement two functions to clearly differentiate between the two options (Children vs. FullDecendants), or an overload GetChildren(bool returnDecendants). Each can implement IEnumerable, so it would just be a matter of which function they pass into their LINQ statement.

Reed Copsey
+5  A: 

I'd go with implementing it in such a way as to have control over how deeply I want to query as well.

Something like Descendants() would retrieve Descendants through all levels while Descendants(0) would retrieve immediate children, Descendants(1) would get children and grandchildren and so on...

SirDemon
+2  A: 

I'd have to agree with Frank. have a look at how LINQ-to-XML handles these scenarios.

In fact, I'd emulate the LINQ-to-XML implementation entirely, but change it for any Data type. Why reinvent the wheel right?

andy
that's how i did it. I made a flattenchildren and a flattentree which correspond to the descendents and descendentsandself
Steve
+8  A: 

Well, the first thing to note is that actually, lambda expressions can be recursive. No, honestly! It isn't easy to do, and certainly isn't easy to read - heck, most LINQ providers (except LINQ-to-Objects, which is much simpler) will have a coughing fit just looking at it... but it is possible. See here for the full, gory details (warning - brain-ache is likely).

However!! That probably won't help much... for a practical approach, I'd look at the way XElement etc does it... note you can remove some of the recursion using a Queue<T> or Stack<T>:

using System;
using System.Collections.Generic;

static class Program {
    static void Main() {
        Node a = new Node("a"), b = new Node("b") { Children = {a}},
            c = new Node("c") { Children = {b}};
        foreach (Node node in c.Descendents()) {
            Console.WriteLine(node.Name);
        }
    }
}

class Node { // very simplified; no sanity checking etc
    public string Name { get; private set; }
    public List<Node> Children { get; private set; }
    public Node(string name) {
        Name = name;
        Children = new List<Node>();
    }
}
static class NodeExtensions {
    public static IEnumerable<Node> Descendents(this Node node) {
        if (node == null) throw new ArgumentNullException("node");
        if(node.Children.Count > 0) {
            foreach (Node child in node.Children) {
                yield return child;
                foreach (Node desc in Descendents(child)) {
                    yield return desc;
                }
            }
        }
    }
}


An alternative would be to write something like SelectDeep (to mimic SelectMany for single levels):

public static class EnumerableExtensions
{
    public static IEnumerable<T> SelectDeep<T>(
        this IEnumerable<T> source, Func<T, IEnumerable<T>> selector)
    {
        foreach (T item in source)
        {
            yield return item;
            foreach (T subItem in SelectDeep(selector(item),selector))
            {
                yield return subItem;
            }
        }
    }
}
public static class NodeExtensions
{
    public static IEnumerable<Node> Descendents(this Node node)
    {
        if (node == null) throw new ArgumentNullException("node");
        return node.Children.SelectDeep(n => n.Children);
    }
}

Again, I haven't optimised this to avoid recursion, but it could be done easily enough.

Marc Gravell
"lambda expressions can be recursive...It isn't easy to do, and certainly isn't easy to read" indeed, which is the purpose of this question! See my second-to-last paragraph. I know how to do recursion but I'm a fairly senior guy and, as you said, it's not pleasant even for us. So I have no problem writing the provider side of things, just looking for a mid-level-friendly way to express it. I'll code the provider against that.
Rex M
Wow? Really? You could write a recursive (yet pure) lambda expression? It would take me a good deal of time and concentration... Mads just happens to be frighteningly smart...
Marc Gravell
I think we are saying the same thing. Recursive lambdas are indeed possible; ways to do them is well-documented and probably easy for someone. They are *doable but not easy* for me and I consider myself a fairly senior guy, so I'm pretty sure they're not easy for mid-level people. So again, as stated in my question, I am looking for an easy way to *express* it. I am clear on how to execute it, don't need help there (and if I do, I'll open another question for that).
Rex M
OK - we're getting side-tracked. That remark was intended as an (hopefully interesting) aside. My main point was, as others suggest, to look at the approaches used in LINQ-to-XML, with an example.
Marc Gravell
I'll probably go the XML-style route if no better suggestions come along, but I'm not totally happy with that which is why I didn't accept it and opened a bounty. I'd like to get some other ideas. I've found a few via the Google, would like to see some more.
Rex M
+3  A: 

The best example I've seen was this:

Recursive LINQ with Y-Combinator.

JasonTrue
+2  A: 

You might want to implement a (extension) Method like FlattenRecusively for your type.

from item in list.FlattenRecusively() where ... select item
Thomas Danecker
I agree :)
leppie
+1  A: 

Are you okay with doing the heavy lifting in your object? (it's not even that heavy)

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;

namespace LinqRecursion
{
    class Program
    {
        static void Main(string[] args)
        {
            Person mom = new Person() { Name = "Karen" };
            Person me = new Person(mom) { Name = "Matt" };
            Person youngerBrother = new Person(mom) { Name = "Robbie" };
            Person olderBrother = new Person(mom) { Name = "Kevin" };
            Person nephew1 = new Person(olderBrother) { Name = "Seth" };
            Person nephew2 = new Person(olderBrother) { Name = "Bradon" };
            Person olderSister = new Person(mom) { Name = "Michelle" };

            Console.WriteLine("\tAll");
            //        All
            //Karen 0
            //Matt 1
            //Robbie 2
            //Kevin 3
            //Seth 4
            //Bradon 5
            //Michelle 6
            foreach (var item in mom)
                Console.WriteLine(item);

            Console.WriteLine("\r\n\tOdds");
            //        Odds
            //Matt 1
            //Kevin 3
            //Bradon 5
            var odds = mom.Where(p => p.ID % 2 == 1);
            foreach (var item in odds)
                Console.WriteLine(item);

            Console.WriteLine("\r\n\tEvens");
            //        Evens
            //Karen 0
            //Robbie 2
            //Seth 4
            //Michelle 6
            var evens = mom.Where(p => p.ID % 2 == 0);
            foreach (var item in evens)
                Console.WriteLine(item);

            Console.ReadLine();

        }
    }

    public class Person : IEnumerable<Person>
    {
        private static int _idRoot;

        public Person() {
            _id = _idRoot++;
        }

        public Person(Person parent) : this()
        {
            Parent = parent;
            parent.Children.Add(this);
        }

        private int _id;
        public int ID { get { return _id; } }
        public string Name { get; set; }

        public Person Parent { get; private set; }

        private List<Person> _children;
        public List<Person> Children
        {
            get
            {
                if (_children == null)
                    _children = new List<Person>();
                return _children;
            }
        }

        public override string ToString()
        {
            return Name + " " + _id.ToString();
        }

        #region IEnumerable<Person> Members

        public IEnumerator<Person> GetEnumerator()
        {
            yield return this;
            foreach (var child in this.Children)
                foreach (var item in child)
                    yield return item;
        }

        #endregion

        #region IEnumerable Members

        IEnumerator IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }

        #endregion
    }
}
Matthew Whited
I don't understand how this addresses my question. I am not asking how to implement a recursive lambda - in fact, I already have established most of the inner workings on how the recursion works. I'm trying to find a way to express it from the front-end, so that distinguishing between a first-level query and a recursive query is intuitive to the person writing the LINQ query.
Rex M
Could you provide a simple hierarchy of your data and which nodes you want to see selected? It might help everyone here understand your goal better.
Matthew Whited
@Matt I have added example data structure and possible query to the question.
Rex M
One class to rule them all! Seriously, this class design looks fishy to me.
leppie
+2  A: 

Rex, you've certainly opened an interesting discussion, but you seem to have eliminated all possibilities - that is, you seem to reject both (1) having the consumer write recursive logic, and (2) having your node class expose relationships of greater than one degree.

Or, perhaps you haven't entirely ruled out (2). I can think of one more approach which is nearly as expressive as the GetDescendents method (or property), but might not be quite so 'ponderous' (depending on the shape of your tree)...

from item in AllItems where item.Parent == currentNode select item

and

from item in AllItems where item.Ancestors.Contains(currentNode) select item
Abraham Pinzur
This is one possibility I am seriously considering. I like the idea of having two separate properties which represent different scopes.It is not my intention to "eliminate" any possibility, I am just trying to push people to think outside the usual methods that come to mind for this type of work.
Rex M
A: 

I simply would use an extension method to traverse the tree.

Oh wait, I am doing that already! :)

leppie
That is fine for your purposes, but my LINQ provider is to an external system. Simply writing a clever extension method which can walk an object graph is not relevant to the problem here. If it helps you may think of it as LINQ to SQL.
Rex M