views:

249

answers:

2

I have some data that has various attributes and I want to hierarchically group that data. For example:

public class Data
{
   public string A { get; set; }
   public string B { get; set; }
   public string C { get; set; }
}

I would want this grouped as:

A1
 - B1
    - C1
    - C2
    - C3
    - ...
 - B2
    - ...
A2
 - B1
    - ...
...

Currently, I have been able to group this using LINQ such that the top group divides the data by A, then each subgroup divides by B, then each B subgroup contains subgroups by C, etc. The LINQ looks like this (assuming an IEnumerable<Data> sequence called data):

var hierarchicalGrouping =
            from x in data
            group x by x.A
                into byA
                let subgroupB = from x in byA
                                group x by x.B
                                    into byB
                                    let subgroupC = from x in byB
                                                    group x by x.C
                                    select new
                                    {
                                        B = byB.Key,
                                        SubgroupC = subgroupC
                                    }
                select new
                {
                    A = byA.Key,
                    SubgroupB = subgroupB
                };

As you can see, this gets somewhat messy the more subgrouping that's required. Is there a nicer way to perform this type of grouping? It seems like there should be and I'm just not seeing it.

Update
So far, I have found that expressing this hierarchical grouping by using the fluent LINQ APIs rather than query language arguably improves readability, but it doesn't feel very DRY.

There were two ways I did this: one using GroupBy with a result selector, the other using GroupBy followed by a Select call. Both could be formatted to be more readable than using query language but don't still don't scale well.

var withResultSelector =
    data.GroupBy(a => a.A, (aKey, aData) =>
        new
        {
            A = aKey,
            SubgroupB = aData.GroupBy(b => b.B, (bKey, bData) =>
                new
                {
                    B = bKey,
                    SubgroupC = bData.GroupBy(c => c.C, (cKey, cData) =>
                    new
                    {
                        C = cKey,
                        SubgroupD = cData.GroupBy(d => d.D)
                    })
                })
        });

var withSelectCall =
    data.GroupBy(a => a.A)
        .Select(aG =>
        new
        {
            A = aG.Key,
            SubgroupB = aG
                .GroupBy(b => b.B)
                .Select(bG =>
            new
            {
                B = bG.Key,
                SubgroupC = bG
                    .GroupBy(c => c.C)
                    .Select(cG =>
                new
                {
                    C = cG.Key,
                    SubgroupD = cG.GroupBy(d => d.D)
                })
            })
        });

What I'd like...
I can envisage a couple of ways that this could be expressed (assuming the language and framework supported it). The first would be a GroupBy extension that takes a series of function pairs for key selection and result selection, Func<TElement, TKey> and Func<TElement, TResult>. Each pair describes the next sub-group. This option falls down because each pair would potentially require TKey and TResult to be different than the others, which would mean GroupBy would need finite parameters and a complex declaration.

The second option would be a SubGroupBy extension method that could be chained to produce sub-groups. SubGroupBy would be the same as GroupBy but the result would be the previous grouping further partitioned. For example:

var groupings = data
    .GroupBy(x=>x.A)
    .SubGroupBy(y=>y.B)
    .SubGroupBy(z=>z.C)

// This version has a custom result type that would be the grouping data.
// The element data at each stage would be the custom data at this point
// as the original data would be lost when projected to the results type.
var groupingsWithCustomResultType = data
    .GroupBy(a=>a.A, x=>new { ... })
    .SubGroupBy(b=>b.B, y=>new { ... })
    .SubGroupBy(c=>c.C, c=>new { ... })

The difficulty with this is how to implement the methods efficiently as with my current understanding, each level would re-create new objects in order to extend the previous objects. The first iteration would create groupings of A, the second would then create objects that have a key of A and groupings of B, the third would redo all that and add the groupings of C. This seems terribly inefficient (though I suspect my current options actually do this anyway). It would be nice if the calls passed around a meta-description of what was required and the instances were only created on the last pass, but that sounds difficult too. Note that his is similar to what can be done with GroupBy but without the nested method calls.

Hopefully all that makes sense. I expect I am chasing rainbows here, but maybe not.

Update - another option
Another possibility that I think is more elegant than my previous suggestions relies on each parent group being just a key and a sequence of child items (as in the examples), much like IGrouping provides now. That means one option for constructing this grouping would be a series of key selectors and a single results selector.

If the keys were all limited to a set type, which is not unreasonable, then this could be generated as a sequence of key selectors and a results selector, or a results selector and a params of key selectors. Of course, if the keys had to be of different types and different levels, this becomes difficult again except for a finite depth of hierarchy due to the way generics parameterization works.

Here are some illustrative examples of what I mean:

For example:

public static /*<grouping type>*/ SubgroupBy(
    IEnumerable<Func<TElement, TKey>> keySelectors,
    this IEnumerable<TElement> sequence,
    Func<TElement, TResult> resultSelector)
{
    ...
}

var hierarchy = data.SubgroupBy(
                    new [] {
                        x => x.A,
                        y => y.B,
                        z => z.C },
                    a => new { /*custom projection here for leaf items*/ })

Or:

public static /*<grouping type>*/ SubgroupBy(
    this IEnumerable<TElement> sequence,
    Func<TElement, TResult> resultSelector,
    params Func<TElement, TKey>[] keySelectors)
{
    ...
}

var hierarchy = data.SubgroupBy(
                    a => new { /*custom projection here for leaf items*/ },
                    x => x.A,
                    y => y.B,
                    z => z.C)

This does not solve implementation inefficiencies, but it should solve the complex nesting. However, what would the return type of this grouping be? Would I need my own interface or can I use IGrouping somehow. How much do I need to define or does the variable depth of the hierarchy still make this impossible?

My guess is that this should be the same as the return type from any IGrouping call but how does the type system infer that type if it isn't involved in any of the parameters that are passed?

This problem is stretching my understanding, which is great, but my brain hurts.

+1  A: 

You need a recursive function. The recursive function calls itself for each node in the tree.

To do this in Linq, you can use a Y-combinator.

Robert Harvey
How would that work when the property I am grouping by changes at each level?
Jeff Yates
It doesn't. You're better off setting up a self-referential association by adding a ParentID to each node (so that you're always referring to ParentID at each level), unless of course the number of tree levels (nested depth) is limited by your application's design.
Robert Harvey
+1  A: 

Here is a description how you can implement an hierarchical grouping mechanism.

From this description:

Result class:

public class GroupResult
{
    public object Key { get; set; }
    public int Count { get; set; }
    public IEnumerable Items { get; set; }
    public IEnumerable<GroupResult> SubGroups { get; set; }
    public override string ToString() 
    { return string.Format("{0} ({1})", Key, Count); }
}

Extension method:

public static class MyEnumerableExtensions
{
    public static IEnumerable<GroupResult> GroupByMany<TElement>(
        this IEnumerable<TElement> elements,
        params Func<TElement, object>[] groupSelectors)
    {
        if (groupSelectors.Length > 0)
        {
            var selector = groupSelectors.First();

            //reduce the list recursively until zero
            var nextSelectors = groupSelectors.Skip(1).ToArray();
            return
                elements.GroupBy(selector).Select(
                    g => new GroupResult
                    {
                        Key = g.Key,
                        Count = g.Count(),
                        Items = g,
                        SubGroups = g.GroupByMany(nextSelectors)
                    });
        }
        else
            return null;
    }
}

Usage:

var result = customers.GroupByMany(c => c.Country, c => c.City);
Obalix