views:

642

answers:

4

Not sure how to call it, but say you have a class that looks like this:

class Person
{
    public string Name;
    public IEnumerable<Person> Friends;
}

You then have a person and you want to "unroll" this structure recursively so you end up with a single list of all people without duplicates.

How would you do this? I have already made something that seems to be working, but I am curious to see how others would do it and especially if there is something built-in to Linq you can use in a clever way to solve this little problem :)


Here is my solution:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects, Func<T, IEnumerable<T>> selector)
{
    // Stop if subjects are null or empty
    if(subjects == null)
        yield break;

    // For each subject
    foreach(var subject in subjects)
    {
        // Yield it
        yield return subject;

        // Then yield all its decendants
        foreach (var decendant in SelectRecursive(selector(subject), selector))
            yield return decendant;
    }
}

Would be used something like this:

var people = somePerson.SelectRecursive(x => x.Friends);
+9  A: 

I don't believe there's anything built into LINQ to do this.

There's a problem with doing it recursively like this - you end up creating a large number of iterators. This can be quite inefficient if the tree is deep. Wes Dyer and Eric Lippert have both blogged about this.

You can remove this inefficiency by removing the direct recursion. For example:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects,
    Func<T, IEnumerable<T>> selector)
{
    if (subjects == null)
    {
        yield break;
    }

    Queue<T> stillToProcess = new Queue<T>(subjects);

    while (stillToProcess.Count > 0)
    {
        T item = stillToProcess.Dequeue();
        yield return item;
        foreach (T child in selector(item))
        {
            stillToProcess.Enqueue(child);
        }
    }
}

This will also change the iteration order - it becomes breadth-first instead of depth-first; rewriting it to still be depth-first is tricky. I've also changed it to not use Any() - this revised version won't evaluate any sequence more than once, which can be handy in some scenarios. This does have one problem, mind you - it will take more memory, due to the queuing. We could probably alleviate this by storing a queue of iterators instead of items, but I'm not sure offhand... it would certainly be more complicated.

One point to note (also noted by ChrisW while I was looking up the blog posts :) - if you have any cycles in your friends list (i.e. if A has B, and B has A) then you'll recurse forever.

Jon Skeet
The cyclicity problem can easily be fixed by associating a `visited` flag with each node, of course.
Will Vousden
@Inquisitor: Only if the type is mutable. Otherwise you could use a `HashSet<T>` to store items you'd already visited.
Jon Skeet
That's exactly the cleverness I was looking for, hehe. Will probably use this instead and maybe add the HashSet to prevent infinite cycles, just for cleaner and safer code. Thanks!
Svish
@Inquisitor: You will still have the problem of how to clear the visited flag before starting iteration. Looks like this will require one to visit each node, just to clear this visited flag. Chicken and egg situation.
Tarydon
You could just clear the visited flag before you visit each node. No wait... :p
Svish
The iteration order doesn't matter since the the goal is just to get them all.
Svish
@Jon Skeet: Would I be correct to assume that the `if(!visitedSubjects.Add(item)) continue;` (where `visitedSubjects` would be a `HashSet<T>`) should be just before the `yield return item`?
Svish
Yes, that would do it.
Jon Skeet
Why is it hard to do it depth-first? Don't you just replace the queue with a stack?
Eric Lippert
@Eric: That's very possible... although you then get it depth first *and last first within each collection* so it still doesn't match the original order :( Again, I'm sure it could with a bit more effort - but my brain isn't up to thinking it through at the moment.
Jon Skeet
Ah, yes, I see EXACTLY what you mean. Interesting coincidence, I was just now examining the recursive algorithm we use to determine what order classes are emitted, and wondering if it could be made iterative. Making this algorithm iterative has exactly this problem; it is not exactly depth-first because then that reverses the order in which classes in a given namespace are emitted. It should be pretty easy to fix it up with judicious use of the Reverse() sequence operator.
Eric Lippert
Actually, there are simpler solutions for the "visited" flag - like using a GUID (which is generated on each unroll) or using an integer which is incremented at each unroll. Then you don't check whether the "visited" flag is set; you check if it is set to the value of this specific unroll. However, this solution still has a problem if you're using multithreading and two threads want to unroll at the same time...
Vilx-
A: 

You could use a non-recursive method like this as well:

  HashSet<Person> GatherAll (Person p) {
     Stack<Person> todo = new Stack<Person> ();
     HashSet<Person> results = new HashSet<Person> ();
     todo.Add (p); results.Add (p);
     while (todo.Count > 0) {
        Person p = todo.Pop (); 
        foreach (Person f in p.Friends) 
           if (results.Add (f)) todo.Add (f);
     }
     return results;
  }

This should handle cycles properly as well. I am starting with a single person, but you could easily expand this to start with a list of persons.

Tarydon
A: 

Recursion is always fun. Perhaps you could simplify your code to:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects, Func<T, IEnumerable<T>> selector) {
    // Stop if subjects are null or empty 
    if (subjects == null || !subjects.Any())
        return Enumerable.Empty<T>();

    // Gather a list of all (selected) child elements of all subjects
    var subjectChildren = subjects.SelectMany(selector);

    // Jump into the recursion for each of the child elements
    var recursiveChildren = SelectRecursive(subjectChildren, selector);

    // Combine the subjects with all of their (recursive child elements).
    // The union will remove any direct parent-child duplicates.
    // Endless loops due to circular references are however still possible.
    return subjects.Union(recursiveChildren);
}

It will result in less duplicates than your original code. However their might still be duplicates causing an endless loop, the union will only prevent direct parent(s)-child(s) duplicates.

And the order of the items will be different from yours :)

Edit: Changed the last line of code to three statements and added a bit more documentation.

Zyphrax
Interesting... a bit unreadable though, hehe. Ordering doesn't really matter btw, so don't worry about that :p
Svish
I've split the single statement into subresults, that might make it a bit easier to read/understand. Basicly I've replaced your for-loops with LINQ. Of course you could go wild, and reduce this method to a single line statement :)
Zyphrax
+1  A: 

use the Aggregate extension...

    List<Person> persons = GetPersons(); 
    List<Person> result = new List<Person>(); 
    persons.Aggregate(result,SomeFunc);

    private static List<Person> SomeFunc(List<Person> arg1,Person arg2)
    {
    arg1.Add(arg2)
    arg1.AddRange(arg2.Persons);
    return arg1;
    }
Chen Kinnrot
I was actually thinking about that a while ago. Care to make some example code?
Svish
sure... give me 15 minutes...
Chen Kinnrot
Interesting. This wouldn't handle cyclic relations though, would it?
Svish
you can add a simple if(arg1.Containes(arg2))
Chen Kinnrot