Here is an alternative method to using Linq, presented as a timed test to see which performs faster. These are the results I obtained with 1000 iterations:
Total words: 1324
Min Max Mean Method
5305 22889 5739.182 LinkMethodToArray
5053 11973 5418.355 LinkMethod
3112 6726 3252.457 HashMethod
The LinkMethod is only about 1.6 times slower in this case. Not as bad as a lot of Linq code that I have performance tested, but it was only 1324 words.
Edit #1
That was before adding the sort. With the sort, you can see that it is comparible with the Linq method. Of course, copying the hash to a list and then sorting the list isn't the most efficient way to do this. We could improve on this. There are a couple of ways that come to mind, but none of them are simple and would require writing a lot of custom code.
Since we want to use what's already available and we want the code to be clear, I have to say that Linq is in fact a very good choice. This has changed my opinion of Linq.. a little. I've seen far too many other comparisons where Linq ends up disastrously slower (on the order of 1,000s of times slower) to give a green light to using Linq anywhere and everywhere, but certainly in this one place it shines very well.
I guess the moral is, as it always has been, test, test, test.
Here are the values with the sort added to HashMethod.
Total words: 1324
Min Max Mean Method
5284 21030 5667.808 LinkMethodToArray
5081 36339 5425.626 LinkMethod
5017 27583 5288.602 HashMethod
Edit #2
A couple of simple optimizations (pre-initializing both the dictionary and the list) make HashMethod a bit noticably faster.
Total words: 1324
Min Max Mean Method
5287 16299 5686.429 LinkMethodToArray
5081 21813 5440.758 LinkMethod
4588 8420 4710.659 HashMethod
Edit #3
With a larger word set, they become much more even. In fact, the Linq method seems to edge out every time. Here is the United States Constitution (All seven articles and signatures). This may be due to the fact that the declaration contains a lot of repeat words ("He has ...").
Total words: 4545
Min Max Mean Method
13363 36133 14086.875 LinkMethodToArray
12917 26532 13668.914 LinkMethod
13601 19435 13836.955 HashMethod
Code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Threading;
class Program
{
static void Main()
{
Thread.CurrentThread.Priority = ThreadPriority.Highest;
// Declaration.txt is a copy of the Declaration of Independence
// which can be found here: http://en.wikisource.org/wiki/United_States_Declaration_of_Independence
string declaration = File.ReadAllText("Declaration.txt");
string[] items = declaration.ToLower().Split(new char[] { ',', '.', ':', ';', '-', '\r', '\n', '\t', ' ' }, StringSplitOptions.RemoveEmptyEntries);
// pre-execute outside timing loop
LinqMethodToArray(items);
LinqMethod(items);
HashMethod(items);
int iterations = 1000;
long min1 = long.MaxValue, max1 = long.MinValue, sum1 = 0;
long min2 = long.MaxValue, max2 = long.MinValue, sum2 = 0;
long min3 = long.MaxValue, max3 = long.MinValue, sum3 = 0;
Console.WriteLine("Iterations: {0}", iterations);
Console.WriteLine("Total words: {0}", items.Length);
Stopwatch sw = new Stopwatch();
for (int n = 0; n < iterations; n++)
{
sw.Reset();
sw.Start();
LinqMethodToArray(items);
sw.Stop();
sum1 += sw.ElapsedTicks;
if (sw.ElapsedTicks < min1)
min1 = sw.ElapsedTicks;
if (sw.ElapsedTicks > max1)
max1 = sw.ElapsedTicks;
sw.Reset();
sw.Start();
LinqMethod(items);
sw.Stop();
sum2 += sw.ElapsedTicks;
if (sw.ElapsedTicks < min2)
min2 = sw.ElapsedTicks;
if (sw.ElapsedTicks > max2)
max2 = sw.ElapsedTicks;
sw.Reset();
sw.Start();
HashMethod(items);
sw.Stop();
sum3 += sw.ElapsedTicks;
if (sw.ElapsedTicks < min3)
min3 = sw.ElapsedTicks;
if (sw.ElapsedTicks > max3)
max3 = sw.ElapsedTicks;
}
Console.WriteLine("{0,-10} {1,-10} {2,-10} Method", "Min", "Max", "Mean");
Console.WriteLine("{0,-10} {1,-10} {2,-10} LinkMethodToArray", min1, max1, (double)sum1 / iterations);
Console.WriteLine("{0,-10} {1,-10} {2,-10} LinkMethod", min2, max2, (double)sum2 / iterations);
Console.WriteLine("{0,-10} {1,-10} {2,-10} HashMethod", min3, max3, (double)sum3 / iterations);
}
static void LinqMethodToArray(string[] items)
{
var ranking = (from item in items
group item by item into r
orderby r.Count() descending
select new { Name = r.Key, Rank = r.Count() }).ToArray();
for (int n = 0; n < ranking.Length; n++)
{
var item = ranking[n];
DoSomethingWithItem(item);
}
}
static void LinqMethod(string[] items)
{
var ranking = (from item in items
group item by item into r
orderby r.Count() descending
select new { Name = r.Key, Rank = r.Count() });
foreach (var item in ranking)
DoSomethingWithItem(item);
}
static void HashMethod(string[] items)
{
var ranking = new Dictionary<string, int>(items.Length / 2);
foreach (string item in items)
{
if (!ranking.ContainsKey(item))
ranking[item] = 1;
else
ranking[item]++;
}
var list = new List<KeyValuePair<string, int>>(ranking);
list.Sort((a, b) => b.Value.CompareTo(a.Value));
foreach (KeyValuePair<string, int> pair in list)
DoSomethingWithItem(pair);
}
static volatile object hold;
static void DoSomethingWithItem(object item)
{
// This method exists solely to prevent the compiler from
// optimizing use of the item away so that this program
// can be executed in Release build, outside the debugger.
hold = item;
}
}