views:

168

answers:

5

I am trying to build a dictionary from an enumerable, but I need an aggregator for all potentially duplicate keys. Using ToDictionary() directly was occasionally causing duplicate keys.

In this case, I have a bunch of time entries ({ DateTime Date, double Hours }), and if multiple time entries occur on the same day, I want the total time for that day. I.e., a custom aggregator, that will give me a unique key for a dictionary entry.

Is there a better way to do it than this?

(This does work.)

    private static Dictionary<DateTime, double> CreateAggregatedDictionaryByDate( IEnumerable<TimeEntry> timeEntries )
    {
        return
            timeEntries
                .GroupBy(te => new {te.Date})
                .Select(group => new {group.Key.Date, Hours = group.Select(te => te.Hours).Sum()})
                .ToDictionary(te => te.Date, te => te.Hours);
    }

I think I'm really looking for something like this:

IEnumerable<T>.ToDictionary( 
    /* key selector : T -> TKey */, 
    /* value selector : T -> TValue */, 
    /* duplicate resolver : IEnumerable<TValue> -> TValue */ );

so...

timeEntries.ToDictionary( 
    te => te.Date, 
    te => te.Hours, 
    duplicates => duplicates.Sum() );

The 'resolver' could be .First() or .Max() or whatever.

Or something similar.


I had one implementation... and another one showed up in the answers while I was working on it.

Mine:

    public static Dictionary<TKey, TValue> ToDictionary<T, TKey, TValue>(
        this IEnumerable<T> input, 
        Func<T, TKey> keySelector, 
        Func<T, TValue> valueSelector, 
        Func<IEnumerable<TValue>, TValue> duplicateResolver)
    {
        return input
            .GroupBy(keySelector)
            .Select(group => new { group.Key, Value = duplicateResolver(group.Select(valueSelector)) })
            .ToDictionary(k => k.Key, k => k.Value);
    }

I was hoping there was something like that already, but I guess not. That would be a nice addition.

Thanks everyone :-)

+1  A: 

If duplicate keys is an issue, perhaps you mean ToLookup? Same principal, but multiple values per key...

private static ILookup<DateTime, double> CreateAggregatedDictionaryByDate( IEnumerable<TimeEntry> timeEntries )
{
    return
        timeEntries
            .GroupBy(te => new {te.Date})
            .Select(group => new {group.Key.Date, Hours = group.Select(te => te.Hours).Sum()})
            .ToLookup(te => te.Date, te => te.Hours);
}

Then you simply do something like:

var lookup = CreateAggregatedDictionaryByDate(...);
foreach(var grp in lookup) {
    Console.WriteLine(grp.Key); // the DateTime
    foreach(var hours in grp) { // the set of doubles per Key
        Console.WriteLine(hours)
    }
}

or use SelectMany of course (from...from).

Marc Gravell
A: 

If you acess a dictionary's indexer and there's nothing there, it allows you to set it returns a default construction of the datatype, in the case of a double it'll be 0. I would maybe do something like

public void blabla(List<TimeEntry> hoho)
{
    Dictionary<DateTime, double> timeEntries = new Dictionary<DateTime, double>();
    hoho.ForEach((timeEntry) =>
        {
            timeEntries[timeEntry.Day] = 0;
        });

    hoho.ForEach((timeEntry) =>
        {
            timeEntries[timeEntry.Day] += timeEntry.Hours;
        });

}

Just used List because for unknown reasons, the .ForEach() extension is not implemented on ienumerable, even though I would imagine the implementation would be line for line identical, but you could just do a literal foreach() which is what it does under the covers anyway.

I think from a readability standpoint, this gets the point across much easier of what is being done, unless this is not what you were trying to do..

Jimmy Hoffa
Generates `KeyNotFoundException: The given key was not present in the dictionary` on the `timeEntries[] +=` call. You need to initialize the dictionary value before you can use += on it.
Sam
Ah right Sam, silly me, fixed in edit now..
Jimmy Hoffa
A: 

I like your method 'cause it's clear, but it you want to make it more efficient you can do the following which will do all aggregation and grouping in a single Aggregate call, albeit a slightly convoluted one.

private static Dictionary<DateTime, double> CreateAggregatedDictionaryByDate(IEnumerable<TimeEntry> timeEntries)
{
    return timeEntries.Aggregate(new Dictionary<DateTime, double>(),
                                 (accumulator, entry) =>
                                    {
                                        double value;
                                        accumulator.TryGetValue(entry.Date, out value);
                                        accumulator[entry.Date] = value + entry.Hours;
                                        return accumulator;
                                    });
}
Sam
Nice. A bit convoluted... but yeah.I guess I'm not really sure what I'm looking for. Maybe an overload for ToDictionary() that provides a third parameter to resolve duplicates?
Jonathan Mitchem
A: 

Are you looking for something like this?

private static Dictionary<DateTime, double> CreateAggregatedDictionaryByDate( IEnumerable<TimeEntry> timeEntries ) 
{ 
    return 
        (from te in timeEntries
        group te by te.Date into grp)
        .ToDictionary(grp => grp.Key, (from te in grp select te.Hours).Sum());
} 
Gabe
Yeah, that's exactly what I have, just purely with the extension method syntax.
Jonathan Mitchem
Mine is different in that it puts the aggregate into the `ToDictionary` call, rather than computing it first.
Gabe
Oh, I see. Totally missed that. Nice, thanks.
Jonathan Mitchem
+1  A: 
public static Dictionary<KeyType, ValueType> ToDictionary
  <SourceType, KeyType, ValueType>
(
  this IEnumerable<SourceType> source,
  Func<SourceType, KeyType> KeySelector,
  Func<SourceType, ValueType> ValueSelector,
  Func<IGrouping<KeyType, ValueType>, ValueType> GroupHandler
)
{
  Dictionary<KeyType, ValueType> result = source
    .GroupBy(KeySelector, ValueSelector)
    .ToDictionary(g => g.Key, GroupHandler);
}

Called by:

Dictionary<DateTime, double> result = timeEntries.ToDictionary(
  te => te.Date,
  te => te.Hours,
  g => g.Sum()
);
David B