tags:

views:

102

answers:

3

Say I have a collection of object arrays of equal dimension, like this:

var rows = new List<object[]>
{
    new object[] {1, "test1", "foo", 1},
    new object[] {1, "test1", "foo", 2},
    new object[] {2, "test1", "foo", 3},
    new object[] {2, "test2", "foo", 4},
};

And I want to group by one or more of the "columns" -- which ones to be determined dynamically at runtime. For instance grouping by columns 1, 2 and 3 would result in three groups:

  • group 1: [1, "test1", "foo"] (includes rows 1 and 2)
  • group 2: [2, "test1", "foo"] (includes row 3)
  • group 3: [2, "test2", "foo"] (includes row 4)

Certainly I can achieve this with some kind of custom group class and by sorting and iterating. However, it seems like I should be able to do it much cleaner with Linq grouping. But my Linq-fu is failing me. Any ideas?

+1  A: 

If your collection contains items with an indexer (Such as your object[] you could do it like this...

var byColumn = 3;

var rows = new List<object[]> 
{ 
    new object[] {1, "test1", "foo", 1}, 
    new object[] {1, "test1", "foo", 2}, 
    new object[] {2, "test1", "foo", 3}, 
    new object[] {2, "test2", "foo", 4}, 
};

var grouped = rows.GroupBy(k => k[byColumn]);
var otherGrouped = rows.GroupBy(k => new { k1 = k[1], k2 = k[2] });

... If you don't like the static sets that are above you could also do something a little more interesting directly in LINQ. This would assume that your HashCodes will works for Equals evaluations. Note, you may want to just write an IEqualityComparer<T>

var cols = new[] { 1, 2};

var grouped = rows.GroupBy(
    row => cols.Select(col => row[col])
               .Aggregate(
                    97654321, 
                    (a, v) => (v.GetHashCode() * 12356789) ^ a));

foreach (var keyed in grouped)
{
    Console.WriteLine(keyed.Key);
    foreach (var value in keyed)
        Console.WriteLine("{0}|{1}|{2}|{3}", value);
}
Matthew Whited
A: 

Shortest solution:

    int[] columns = { 0, 1 };

    var seed = new[] { rows.AsEnumerable() }.AsEnumerable();    // IEnumerable<object[]> = group, IEnumerable<group> = result

    var result = columns.Aggregate(seed, 
        (groups, nCol) => groups.SelectMany(g => g.GroupBy(row => row[nCol])));
Grozz
+2  A: 

@Matthew Whited's solution is nice if you know the grouping columns up front. However, it sounds like you need to determine them at runtime. In that case, you can create an equality comparer which defines row equality for GroupBy using a configurable column set:

rows.GroupBy(row => row, new ColumnComparer(0, 1, 2))

The comparer checks the equality of the value of each specified column. It also combines the hash codes of each value:

public class ColumnComparer : IEqualityComparer<object[]>
{
    private readonly IList<int> _comparedIndexes;

    public ColumnComparer(params int[] comparedIndexes)
    {
        _comparedIndexes = comparedIndexes.ToList();
    }

    #region IEqualityComparer

    public bool Equals(object[] x, object[] y)
    {
        return ReferenceEquals(x, y) || (x != null && y != null && ColumnsEqual(x, y));
    }

    public int GetHashCode(object[] obj)
    {
        return obj == null ? 0 : CombineColumnHashCodes(obj);
    }    
    #endregion

    private bool ColumnsEqual(object[] x, object[] y)
    {
        return _comparedIndexes.All(index => ColumnEqual(x, y, index));
    }

    private bool ColumnEqual(object[] x, object[] y, int index)
    {
        return Equals(x[index], y[index]);
    }

    private int CombineColumnHashCodes(object[] row)
    {
        return _comparedIndexes
            .Select(index => row[index])
            .Aggregate(0, (hashCode, value) => hashCode ^ (value == null ? 0 : value.GetHashCode()));
    }
}

If this is something you will do often, you can put it behind an extension method:

public static IGrouping<object[], object[]> GroupByIndexes(
    this IEnumerable<object[]> source,
    params int[] indexes)
{
    return source.GroupBy(row => row, new ColumnComparer(indexes));
}

// Usage

row.GroupByIndexes(0, 1, 2)

Extending IEnumerable<object[]> will only work with .NET 4. You would need to extend List<object[]> directly in .NET 3.5.

Bryan Watts
You won't want to just `xor` the hashcodes. If you do, you will increase the chance of collisions.
Matthew Whited
Of course! Nice elegant solution. There were a few little errors in ColumnComparer. I edited your post with the corrections.
Tim Scott
@Matthew Whited: You are correct, that is a less-than-optimal implementation of `GetHashCode`. I wanted to avoid getting into that messy discussion, though, so went with the low-friction approach.
Bryan Watts
@Tim Scott: Thanks for fixing the errors I had - it was late :-) I noticed that you remove the null check in `GetHashCode`. I included that because `ColumnComparer` is a public type. If you make it `private`, where you can absolutely guarantee no nulls, then it is safe to remove it. In the future, though, please refrain from making stylistic edits such as adding a local variable within `CombineColumnHashCodes`. To me, that is superfluous and I don't want it to be mistaken for code I wrote. Thanks.
Bryan Watts
@Bryan: Yeah the null check should be there. Resharper told me it would be always false. Never seen Resharper be wrong about something like that before.@Matthew Whited: Can you suggest a more robust way to implement GetHashCode?
Tim Scott
@Tim Scott: My current approach to combining hash codes comes from here: http://blog.roblevine.co.uk/?cat=10
Bryan Watts
@Bryan: Made a couple more changes, both to guard against null ref exception. 1) Use static Equals in ColumnEqual; 2) Check for null value in CombineColumnHashCodes.
Tim Scott