views:

637

answers:

3

I have two lists of custom objects and want to update a field for all objects in one list if there is an object in the other list which matches on another pair of fields.

This code explains the problem better and produces the results I want. However for larger lists 20k, and a 20k list with matching objects, this takes a considerable time (31s). I can improve this with ~50% by using the generic lists Find(Predicate) method.

using System;
using System.Linq;
using System.Linq.Expressions;
using System.Collections.Generic;
namespace ExperimentFW3
{
    public class PropValue
    {
     public string Name;
     public decimal Val;
     public decimal Total;
    }
    public class Adjustment
    {
     public string PropName;
     public decimal AdjVal;
    }
    class Program
    {
     static List<PropValue> propList;
     static List<Adjustment> adjList;

     public static void Main()
     {
      propList = new List<PropValue>{
       new PropValue{Name = "Alfa", Val=2.1M},
       new PropValue{Name = "Beta", Val=1.0M},
       new PropValue{Name = "Gamma", Val=8.0M}
      };
      adjList = new List<Adjustment>{
       new Adjustment{PropName = "Alfa", AdjVal=-0.1M},
       new Adjustment{PropName = "Beta", AdjVal=3M}
      };

      foreach (var p in propList)
      {
       Adjustment a = adjList.SingleOrDefault(
        av => av.PropName.Equals(p.Name)
        );
       if (a != null)
        p.Total = p.Val + a.AdjVal;
       else
        p.Total = p.Val;
      }
     }
    }
}

The desired result is: Alfa total=2,Beta total=4,Gamma total=8

But I wonder if this is possible to do even faster. Inner joining the two lists takes very little time, even when looping over 20k items in the resultset.

var joined = from p in propList
             join a in adjList on p.Name equals a.PropName
             select new { p.Name, p.Val, p.Total, a.AdjVal };

So my question is if it's possible to do something like I would do with T-SQL? An UPDATE from a left join using ISNULL(val,0) on the adjustment value.

+6  A: 

That join should be fairly fast, as it will first loop through all of adjList to create a lookup, then for each element in propList it will just use the lookup. This is faster than your O(N * M) method in the larger code - although that could easily be fixed by calling ToLookup (or ToDictionary as you only need one value) on adjList before the loop.

EDIT: Here's the modified code using ToDictionary. Untested, mind you...

var adjDictionary = adjList.ToDictionary(av => av.PropName);
foreach (var p in propList)
{
    Adjustment a;
    if (adjDictionary.TryGetValue(p.Name, out a))
    {
        p.Total = p.Val + a.AdjVal;
    }
    else
    {
        p.Total = p.Val;
    }
}
Jon Skeet
Thanks for a quick reply! I recently began looking at linq and got caught in trying to hard to emulate t-sql in code. Your untested code measures in ms so I got the performance boost I asked for.
mnsc
Jon Skeet
There's a "community wiki" checkbox when posting. Sometimes it's enabled even though you never clicked it.
David B
Odd. I know there are other triggers for automatically making something a wiki, but that sounds like a basic bug :(
Jon Skeet
A: 

If adjList might have duplicate names, you should group the items before pushing to dictionary.

Dictionary<string, decimal> adjDictionary = adjList
  .GroupBy(a => a.PropName)
  .ToDictionary(g => g.Key, g => g.Sum(a => a.AdjVal))

propList.ForEach(p => 
  {
    decimal a;
    adjDictionary.TryGetValue(p.Name, out a);
    p.Total = p.Val + a;
  });
David B
A: 

I know I am late posting this, but I thought someone would appreciate the clearer shorter answer below that handles multiple records per lookup in adjList. Creating a LookUp will allow fast lookups on multiple items and will return an empty list if there are no records in LookUp.

var adjLookUp = adjList.ToLookUp(a => a.PropName);
foreach (var p in propList) 
    p.Total = p.Val + adjLookUp[p.Name].Sum(a => a.AdjVal);
DRBlaise