views:

89

answers:

3

I have a list of entites that contains ~137000 records that I loop through I then need to linq to a list of Tuple of additional params this contains ~ 150000

Why does it keep taking longer, the more iterations it does? Here is from the stopwatch Found: 136770 items that match the criteria.

10,000 items processed EllapsedTime: 5473That is: 0.0912166666666667 minutes.

20,000 items processed EllapsedTime: 15307That is: 0.255116666666667 minutes.

30,000 items processed EllapsedTime: 30065That is: 0.501083333333333 minutes.

50,000 items processed EllapsedTime: 74507That is: 1.24178333333333 minutes.

75,000 items processed EllapsedTime: 157836That is: 2.6306 minutes.

100,000 items processed EllapsedTime: 272495That is: 4.54158333333333 minutes.

EllapsedTime: 499663That is: 8.32771666666667 minutes.

Is there some way to optimize this?

 List<Entites> alMatched 
List<Tuple<int, double, int, int>> lsItems = new List<Tuple<int, double, int, int>>();
IEnumerable<Tuple<int, double, int, int>> enumThingy = lsItems;

 for (int z = 0; z <= alMatched.Count() - 1;z++ )
            {
               Entity a = alMatched[z];
               var newRepl = enumThingy.Where(d => d.First == a.ID).First();
               if (newRepl != null)
               {

               }

                switch (z)
                {
                    case 10000:
                        Debug.Print("10,000 items processed " + ElapsedTime(sw.ElapsedMilliseconds));
                        break;
                    case 20000:
                        Debug.Print("20,000 items processed " + ElapsedTime(sw.ElapsedMilliseconds));
                        break;
                    case 30000:
                        Debug.Print("30,000 items processed " + ElapsedTime(sw.ElapsedMilliseconds));
                        break;
                    case 50000:
                        Debug.Print("50,000 items processed " + ElapsedTime(sw.ElapsedMilliseconds));
                        break;
                    case 75000:
                        Debug.Print("75,000 items processed " + ElapsedTime(sw.ElapsedMilliseconds));
                        break;
                    case 100000:
                        Debug.Print("100,000 items processed " + ElapsedTime(sw.ElapsedMilliseconds));
                        break;
                }

            }

Regards

_Eric

+2  A: 

Look at this code:

for (int z = 0; z <= alMatched.Count() - 1;z++ )
{
    Entity a = alMatched[z];
    var newRepl = enumThingy.Where(d => d.First == a.ID).First();

In this case (and I'd suspect your "real" case), the enumThingy and the alMatched enumerables are both in the same order.

Because of this, when you are on case 1, the call to enumThingy.Where returns on the first iteration. On case 100, it takes 100 iterations to match your condition, and exit out. On case 10000, it takes 10000 iterations.

Basically, the further you go, the worse this gets. Your algorithm is O(N^2), but LINQ is short-cutting because you're using the same list, and the ordering is helping you "short cut" out of the where fast.

Reed Copsey
Yup, the question really boils down to "why is this O(n^2) algorithm behaves as if it was O(n^2)"?! ... ;)
Pavel Minaev
+1  A: 

Sure. Try a Dictionary instead of a List

    List<Tuple<int, double, int, int>> lsItems = new List<Tuple<int, double, int, int>>();

//should be 

var lsItems = new Dictionary<int, Tuple<int, double, int, int>>();

/reference the items with:

var newRepl = lsItems[a.ID];

Steve
yup! Thanks!Found: 136770 items that match the criteria.10,000 items processed EllapsedTime: 2974That is: 0.0495666666666667 minutes.20,000 items processed EllapsedTime: 5445That is: 0.09075 minutes.30,000 items processed EllapsedTime: 7911That is: 0.13185 minutes.50,000 items processed EllapsedTime: 12883That is: 0.214716666666667 minutes.75,000 items processed EllapsedTime: 19127That is: 0.318783333333333 minutes.100,000 items processed EllapsedTime: 25280That is: 0.421333333333333 minutes.EllapsedTime: 34449That is: 0.57415 minutes.
Eric
A: 

You can use different aproaches to achieve a speed gain here.

One would be to use a hashtable to store the items in enumThingy and access them by the key you are looking for.

Another option would be to sort the enumthingee and also sort the alMatched ones, and then use a "sliding aproach" to find all items you need..

Currently you work of an enum, and it has to check all items to find the one you need, so it will take more and more time the later in the chain your item is located (or missing totally)

Heiko Hatzfeld