A friend and I were a bit perplexed during a programming discussion today. As an example we created a fictive problem of having a List<int>
of n random integers (typically 1.000.000), and wanted to create a function that returned the set of all integers that there were more than one of. Pretty straightforward stuff. We created one linq statement to solve this problem, and a plain insertion sort based algorithm.
Now, as we tested the speed the code ran at (using System.Diagnostics.StopWatch
), the results were confusing. Not only did the linq outperform the simple sort, but it ran faster than a single foreach
/for
that only did a single loop of the list, and that had no operations within (which, on a side track, I thought the compiler was supposed to discover and remove alltogether).
If we generated a new List<int>
of random numbers in the same execution of the program and ran the linq again, the performance would increase by orders of magnitude (typically thousandfold). The performance of the empty loops were of course the same.
So, what is going on here? Is linq using parallelism to outperform normal loops? How are these results even possible? Linq uses quicksort which runs at n*log(n), which per definition is already slower than n.
And what is happening at the performance leap on the second run?
We were both baffled and intrigued at these results and were hoping for some clarifying insights from the community, just to satisfy our own curiosity.