views:

122

answers:

5

I am trying to figure out how to refactor this LINQ code nicely. This code and other similar code repeats within the same file as well as in other files. Sometime the data being manipulated is identical and sometimes the data changes and the logic remains the same.

Here is an example of duplicated logic operating on different fields of different objects.

public IEnumerable<FooDataItem> GetDataItemsByColor(IEnumerable<BarDto> dtos)
{
    double totalNumber = dtos.Where(x => x.Color != null).Sum(p => p.Number);
    return from stat in dtos
           where stat.Color != null
           group stat by stat.Color into gr
           orderby gr.Sum(p => p.Number) descending
           select new FooDataItem
           {
               Color = gr.Key,
               NumberTotal = gr.Sum(p => p.Number),
               NumberPercentage = gr.Sum(p => p.Number) / totalNumber
           };
}

public IEnumerable<FooDataItem> GetDataItemsByName(IEnumerable<BarDto> dtos)
{
    double totalData = dtos.Where(x => x.Name != null).Sum(v => v.Data);
    return from stat in dtos
           where stat.Name != null
           group stat by stat.Name into gr
           orderby gr.Sum(v => v.Data) descending
           select new FooDataItem
           {
               Name = gr.Key,
               DataTotal = gr.Sum(v => v.Data),
               DataPercentage = gr.Sum(v => v.Data) / totalData
           };
}

Anyone have a good way of refactoring this?

+1  A: 

I think that if you refactored this, it would be more difficult to read than what you have already. Anything I can think of either involves dynamic Linq, or modifying or encapsulating BarDto to have some sort of a specialized item to be used for nothing but grouping.

Dave Markle
+3  A: 

I wouldn't use the query syntax for this, use the method chain.

public IEnumerable<FooDataItem> GetDataItems(IEnumerable<BarDto> dtos, Func<BarDto, object> key, Func<BarDto, object> data)
{
  double totalData = dtos.Where(d => key(d) != null).Sum(data);
  return dtos.Where(d => key(d) != null)
             .GroupBy(key)
             .OrderBy(d => d.Sum(data))
             .Select(
               o => new FooDataItem() 
               { 
                 Key = o.Key, 
                 Total = o.Sum(data), 
                 Percentage = o.sum(data) / totalData
               });
}

(written without compiler etc).

Personally, I wouldn't refactor it though, as this would make the code less read- and understandable.

Femaref
+8  A: 

Something like this:

public IEnumerable<FooDataItem> GetDataItems<T>(IEnumerable<BarDto> dtos,
    Func<BarDto, T> groupCriteria,
    Func<BarDto, double> dataSelector,
    Func<T, double, double, FooDataItem> resultFactory)
{
    var validDtos = dtos.Where(d => groupCriteria(d) != null);
    double totalNumber = validDtos.Sum(dataSelector);

    return validDtos
        .GroupBy(groupCriteria)
        .OrderBy(g => g.Sum(dataSelector))
        .Select(gr => resultFactory(gr.Key,
                                    gr.Sum(dataSelector),
                                    gr.Sum(dataSelector) / totalNumber));
}

In your example, you might call it like this:

GetDataItems(
    x => x.Color,
    x => x.Number,
    (key, total, pct) =>
        new FooDataItem {
            Color = key, NumberTotal = total, NumberPercentage = pct });

If you changed FooDataItem to be more generic, it would be easier.

mquander
this is beautiful code.
Femaref
Nice. For readability and to avoid having to refactor all of the existing function calls, I would probably still wrap your call to `GetDataItems` in a `GetDataItemsByColor(IEnumerable<BarDto> dtos)` function.
Jelly
+2  A: 

You would need to switch from query expression and convert all of your where, group by, order by, and select clauses into lambdas. You could then create a function that accepts each of them as parameters. Here's an example:

private static IEnumerable<FooDataItem> GetData<T>(IEnumerable<Foo> foos, Func<Foo, bool> where, Func<Foo, T> groupby, Func<IGrouping<T, Foo>, T> orderby, Func<IGrouping<T, Foo>, FooDataItem> select)
{
    var query = foos.Where(where).GroupBy(groupby).OrderBy(orderby).Select(select);
    return query;
}

Based on this code

class Foo
{
    public int Id { get; set; }
    public int Bar { get; set; }
}

...

List<Foo> foos = new List<Foo>(); // populate somewhere

Func<Foo, bool> where = f => f.Id > 0;
Func<Foo, int> groupby = f => f.Id;
Func<IGrouping<int, Foo>, int> orderby = g => g.Sum(f => f.Bar);
Func<IGrouping<int, Foo>, FooDataItem> select = g => new FooDataItem { Key = g.Key, BarTotal = g.Sum(f => f.Bar) };

var query = GetData(foos, where, groupby, orderby, select);
Anthony Pegram
+1  A: 

Here is an extension method which factors out the similar portions of each query:

public static IEnumerable<TDataItem> GetDataItems<TData, TDataItem>(
    this IEnumerable<BarDto> dtos,
    Func<BarDto, TData> dataSelector,
    Func<BarDto, double> numberSelector,
    Func<TData, double, double, TDataItem> createDataItem)
    where TData : class
{
    var eligibleDtos = dtos.Where(dto => dataSelector(dto) != null);

    var totalNumber = eligibleDtos.Sum(numberSelector);

    return
        from dto in eligibleDtos
        group dto by dataSelector(dto) into dtoGroup
        let groupNumber = dtoGroup.Sum(numberSelector)
        orderby groupNumber descending
        select createDataItem(dtoGroup.Key, groupNumber, groupNumber / totalNumber);
}

You would use it like this:

var itemsByName = dtos.GetDataItems(
    dto => dto.Name,
    dto => dto.Data,
    (name, groupTotal, groupPercentage) => new FooDataItem
    {
        Name = name,
        NumberTotal = groupTotal,
        NumberPercentage = groupPercentage
    });

var itemsByColor = dtos.GetDataItems(
    dto => dto.Color,
    dto => dto.Number,
    (color, groupTotal, groupPercentage) => new FooDataItem
    {
        Color = color,
        DataTotal = groupTotal,
        DataPercentage = groupPercentage
    });
Bryan Watts