ansaurus

Question

Answer 1

+4 A:

The standard LINQ way is this:

stringToNumOccurrences = strings.GroupBy(s => s)
                                .ToDictionary(g => g.Key, g => g.Count());

Timwi 2010-10-24 18:31:21

No, this is not homework, and I am getting a compile time error, with the above: error CS1660: Cannot convert lambda expression to type 'System.Collections.Generic.IEqualityComparer<string>' because it is not a delegate type also error CS1502: The best overloaded method match for 'System.Collections.Generic.Dictionary<string,System.Linq.IGrouping<string,string>>.this[string]' has some invalid arguments

Michael Goldshteyn 2010-10-24 18:35:17

Change it to `Count()`.

SLaks 2010-10-24 18:37:48

@Michael: Thanks, fixed my answer.

Timwi 2010-10-24 18:37:52

I don't know who's voting you down. Other than a reasonable typo, you did have the first correct solution (using LINQ).

Michael Goldshteyn 2010-10-24 19:02:39

Answer 2

+9 A:

var dico = strings.GroupBy(x => x).ToDictionary(x => x.Key, x => x.Count());

Darin Dimitrov 2010-10-24 18:34:39

Answer 3

+4 A:

Timwi/Darin's suggestion will perform this in a single pass over the original collection, but it will create multiple buffers for the groupings. LINQ isn't really very good at doing this kind of counting, and a problem like this was my original motiviation for writing Push LINQ. You might like to read my blog post on it for more details about why LINQ isn't terribly efficient here.

Push LINQ and the rather more impressive implementation of the same idea - Reactive Extensions - can handle this more efficiently.

Of course, if you don't really care too much about the extra efficiency, go with the GroupBy answer :)

EDIT: I hadn't noticed that your strings were ordered. That means you can be much more efficient, because you know that once you've seen string x and then string y, if x and y are different, you'll never see x again. There's nothing in LINQ to make this particularly easier, but you can do it yourself quite easily:

public static IDictionary<string, int> CountEntries(IEnumerable<string> strings)
{
    var dictionary = new Dictionary<string, int>();

    using (var iterator = strings.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            // No entries
            return dictionary;
        }
        string current = iterator.Current;
        int currentCount = 1;
        while (iterator.MoveNext())
        {
            string next = iterator.Current;
            if (next == current)
            {
                currentCount++;
            }
            else
            {
                dictionary[current] = currentCount;
                current = next;
                currentCount = 1;
            }
        }
        // Write out the trailing result
        dictionary[current] = currentCount;
    }
    return dictionary;
}

This is O(n), with no dictionary lookups involved other than when writing the values. An alternative implementation would use foreach and a current value starting off at null... but that ends up being pretty icky in a couple of other ways. (I've tried it :) When I need special-case handling for the first value, I generally go with the above pattern.

Actually you could do this with LINQ using Aggregate, but it would be pretty nasty.

Jon Skeet 2010-10-24 18:35:54

Jon, thanks for looking at this. I was wondering about this myself. Optimally, I was hoping that a dictionary could be built in a single pass over col1 and the occurrence count could be incremented on the fly.

Michael Goldshteyn 2010-10-24 18:41:37

Thanks for spelling out the solution involving loops. I did realize that this was the "verbose" way to do this, but was hoping that there could be a quick and easy (and efficient) way to do this via LINQ that I was overlooking.

Michael Goldshteyn 2010-10-24 18:43:05

@Michael: LINQ doesn't generally present anything which would only work in very specific situations (such as the collection already being ordered).

Jon Skeet 2010-10-24 18:43:45

@Jon, actually I wouldn't mind the dictionary lookups so much in a LINQ solution, but other temporary collections, even if they are created only once per group, do seem far less palatable. I guess I was hoping that there was some sort of self-referential LINQ solution that could use the dictionary as it was built to increment the occurrence count.

Michael Goldshteyn 2010-10-24 18:48:00

@Michael: Yes, the LINQ GroupBy solution will create an array per group. It's not terribly nice, unfortunately. It's pretty hard to avoid though.

Jon Skeet 2010-10-24 18:53:46

@Timwi: Fixed, thanks.

Jon Skeet 2010-10-24 18:54:59

So, there is no way to access the dictionary that is being built by ToDictionary from the lambdas that are passed to the extension method, even though they are being invoked as part of the building of the dictionary?

Michael Goldshteyn 2010-10-24 18:58:15

@Jon, I'm curious ... in what ways do you think the straight-forward `foreach`version of your code becomes "pretty icky"? It seems fairly simple to me.

corvuscorax 2010-10-24 19:58:42

@corvuscorax: You have to check whether `current` is null inside the loop when you spot a change, and then again at the end just in case the list's empty. It just doesn't feel right.

Jon Skeet 2010-10-24 20:19:29

@Michael: That's right. The lambdas are solely there to provide the keys and values, and they don't get any extra information.

Jon Skeet 2010-10-24 20:19:52

@Downvoter: Care to give a reason?

Jon Skeet 2010-10-24 21:04:45

Answer 4

+1 A:

If this is actual production code, I'd go with Timwi's response.

If this is indeed homework and you're expected to write your own implementation, it shouldn't be too tough. Here are just a couple of hints to point you in the right direction:

Dictionary<TKey, TValue> has a ContainsKey method.
The IDictionary<TKey, TValue> interface's this[TKey] property is settable; i.e., you can do dictionary[key] = 1 (which means you can also do dictionary[key] += 1).

From those clues I think you should be able to figure out how to do it "by hand."

Dan Tao 2010-10-24 18:36:32

Answer 5

+1 A:

If you are looking for a particularly efficient (fast) solution, then GroupBy is probably too slow for you. You could use a loop:

var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };
var stringToNumOccurrences = new Dictionary<string, int>();
foreach (var str in strings)
{
    if (stringToNumOccurrences.ContainsKey(str))
        stringToNumOccurrences[str]++;
    else
        stringToNumOccurrences[str] = 1;
}
return stringToNumOccurrences;

Timwi 2010-10-24 18:36:59

@Timwi: That's still doing a lot of dictionary lookups. Have a look at my answer for one which doesn't :) You could make this a bit more efficient using `TryGetValue` - exactly one lookup per iteration (and one write) rather than potentially two.

Jon Skeet 2010-10-24 18:44:07

@Jon: I was actually writing an answer pretty much exactly like yours, but then I decided against it... maybe I should have posted it...

Timwi 2010-10-24 18:54:05

Answer 6

+1 A:

This is a foreach version like the one that Jon mentions that he finds "pretty icky" in his answer. I'm putting it in here, so there's something concrete to talk about.

I must admit that I find it simpler than Jon's version and can't really see what's icky about it. Jon? Anyone?

static Dictionary<string, int> CountOrderedSequence(IEnumerable<string> source)
{
    var result = new Dictionary<string, int>();
    string prev = null;
    int count = 0;
    foreach (var s in source)
    {
        if (prev != s && count > 0)
        {
            result.Add(prev, count);
            count = 0;
        }
        prev = s;
        ++count;
    }
    if (count > 0)
    { 
        result.Add(prev, count);
    }
    return result;
}

Updated to add a necessary check for empty source - I still think it's simpler than Jon's :-)

corvuscorax 2010-10-24 20:04:01

The fact that you need to basically check whether you've seen any elements *twice* feels nasty to me, still. It doesn't express what you would *think* about when doing it manually - which would be to get the first element unconditionally, and go from there looking for new values, only *then* comparing each with the previous.

Jon Skeet 2010-10-24 21:06:54

ansaurus

tags:

views:

answers:

Collection of strings to dictionary

related questions