tags:

views:

52

answers:

4

I want to write a function that reads a file and counts the number of times each word occurs. Assuming the file-reading is handled and produces a list of strings representing each line in the file, I need a function to count the occurrence of each word. Firstly, is using a Dictionary<string,int> the best approach? The key is the word, and the value is the number of occurrences of that word.

I wrote this function which iterates through each line and each word in a line and builds up a dictionary:

static IDictionary<string, int> CountWords(IEnumerable<string> lines)
var dict = new Dictionary<string, int>();
foreach (string line in lines)
{
    string[] words = line.Split(' ');
    foreach (string word in words)
    {
        if (dict.ContainsKey(word))
            dict[word]++;
        else
            dict.Add(word, 1);
    }
}

However, I would like to somehow write this function.. functionally, using LINQ (because LINQ is fun and I'm trying to improve my functional programming skills :D) I managed to come up with this expresion, but I'm not sure whether it's the best way to do it functionally:

static IDictionary<string, int> CountWords2(IEnumerable<string> lines)
{
    return lines
        .SelectMany(line => line.Split(' '))
        .Aggregate(new Dictionary<string, int>(),
            (dict, word) =>
            {
                if (dict.ContainsKey(word))
                    dict[word]++;
                else
                    dict.Add(word, 1);
                return dict;
            });
}

So while I have two working solutions, I am also interested in learning what the best approach is to this problem. Anyone with insight on LINQ and FP?

+3  A: 

Take a look at GroupBy instead of Aggregate -- it will give you a set of IGrouping<string, string> objects. You'll be able to retrieve the count of each word by calling .Count() on each grouping.

Tim Robinson
BTW, since you said you were interested in learning, I didn't post the exact code :)
Tim Robinson
thanks, your hint helped me play with things a bit more and get to something a bit nicer :)
BleuM937
+3  A: 

As Tim Robinson wrote you could use GroupBy with ToDictionary like this

    public static Dictionary<string, int> CountWords3(IEnumerable<string> strings)
    {
        return strings.SelectMany(s => s.Split(' ')).GroupBy(w=>w).ToDictionary(g => g.Key, g => g.Count());
    }
Yury Tarabanko
Technically that's not by use of Language integrated query but by use of some of the extension methods LINQ is build upon (but then OP is asking for LINQ but using extension methods so this might be what he's asking for anyways :) )
Rune FS
@Rune FS: It's all LINQ, and a matter of personal preference whether you use query comprehension syntax or extension method syntax. (And, in fact, there are some queries that can only be expressed using extension method syntax. Would you claim that those queries aren't LINQ?)
LukeH
@Rune FS: I think he is asking for linqish way of doing things like that. And as LukeH pointed out it's all LINQ.
Yury Tarabanko
I managed to work it out myself after Tim Robinson's hint to use GroupBy. This is essentially the query I finished with.
BleuM937
@Luke. I would never claim that calling a method is to use language integrated queries. I would however call it exactly that if the queries were integrated with the language, say if a special syntax was used for invoking those methods e.g. from x in xs select x :)
Rune FS
+1  A: 

The following should do the job.

static IDictionary<String, Int32> CountWords(IEnumerable<String> lines)
{
    return lines
        .SelectMany(line => line.Split(' '))
        .GroupBy(word => word)
        .ToDictionary(group => group.Key, group => group.Count());
}
Daniel Brückner
A: 

if you want to use linq (and not use the extension methods used by linq firectly) you can write:

var groups = from line in lines
             from s in line.Split(new []{"\t", " "},StringSplitOptions.RemoveEmptyEntries) 
             group s by s into g
             select g;
var dic = groups.ToDictionary(g => g.Key,g=>g.Count());

your current implementation won't split on tab and might include the "word" string.Empty so I've changed the split in accordance to what I think your intentions are.

Rune FS