tags:

views:

122

answers:

6

Given an ordered collection of strings:

var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };

Use LINQ to create a dictionary of string to number of occurrences of that string in the collection:

IDictionary<string,int> stringToNumOccurrences = ...;

Preferably do this in a single pass over the strings collection...

+4  A: 

The standard LINQ way is this:

stringToNumOccurrences = strings.GroupBy(s => s)
                                .ToDictionary(g => g.Key, g => g.Count());
Timwi
No, this is not homework, and I am getting a compile time error, with the above: error CS1660: Cannot convert lambda expression to type 'System.Collections.Generic.IEqualityComparer<string>' because it is not a delegate type also error CS1502: The best overloaded method match for 'System.Collections.Generic.Dictionary<string,System.Linq.IGrouping<string,string>>.this[string]' has some invalid arguments
Michael Goldshteyn
Change it to `Count()`.
SLaks
@Michael: Thanks, fixed my answer.
Timwi
I don't know who's voting you down. Other than a reasonable typo, you did have the first correct solution (using LINQ).
Michael Goldshteyn
+9  A: 
var dico = strings.GroupBy(x => x).ToDictionary(x => x.Key, x => x.Count());
Darin Dimitrov
+4  A: 

Timwi/Darin's suggestion will perform this in a single pass over the original collection, but it will create multiple buffers for the groupings. LINQ isn't really very good at doing this kind of counting, and a problem like this was my original motiviation for writing Push LINQ. You might like to read my blog post on it for more details about why LINQ isn't terribly efficient here.

Push LINQ and the rather more impressive implementation of the same idea - Reactive Extensions - can handle this more efficiently.

Of course, if you don't really care too much about the extra efficiency, go with the GroupBy answer :)

EDIT: I hadn't noticed that your strings were ordered. That means you can be much more efficient, because you know that once you've seen string x and then string y, if x and y are different, you'll never see x again. There's nothing in LINQ to make this particularly easier, but you can do it yourself quite easily:

public static IDictionary<string, int> CountEntries(IEnumerable<string> strings)
{
    var dictionary = new Dictionary<string, int>();

    using (var iterator = strings.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            // No entries
            return dictionary;
        }
        string current = iterator.Current;
        int currentCount = 1;
        while (iterator.MoveNext())
        {
            string next = iterator.Current;
            if (next == current)
            {
                currentCount++;
            }
            else
            {
                dictionary[current] = currentCount;
                current = next;
                currentCount = 1;
            }
        }
        // Write out the trailing result
        dictionary[current] = currentCount;
    }
    return dictionary;
}

This is O(n), with no dictionary lookups involved other than when writing the values. An alternative implementation would use foreach and a current value starting off at null... but that ends up being pretty icky in a couple of other ways. (I've tried it :) When I need special-case handling for the first value, I generally go with the above pattern.

Actually you could do this with LINQ using Aggregate, but it would be pretty nasty.

Jon Skeet
Jon, thanks for looking at this. I was wondering about this myself. Optimally, I was hoping that a dictionary could be built in a single pass over col1 and the occurrence count could be incremented on the fly.
Michael Goldshteyn
Thanks for spelling out the solution involving loops. I did realize that this was the "verbose" way to do this, but was hoping that there could be a quick and easy (and efficient) way to do this via LINQ that I was overlooking.
Michael Goldshteyn
@Michael: LINQ doesn't generally present anything which would only work in very specific situations (such as the collection already being ordered).
Jon Skeet
@Jon, actually I wouldn't mind the dictionary lookups so much in a LINQ solution, but other temporary collections, even if they are created only once per group, do seem far less palatable. I guess I was hoping that there was some sort of self-referential LINQ solution that could use the dictionary as it was built to increment the occurrence count.
Michael Goldshteyn
@Michael: Yes, the LINQ GroupBy solution will create an array per group. It's not terribly nice, unfortunately. It's pretty hard to avoid though.
Jon Skeet
@Timwi: Fixed, thanks.
Jon Skeet
So, there is no way to access the dictionary that is being built by ToDictionary from the lambdas that are passed to the extension method, even though they are being invoked as part of the building of the dictionary?
Michael Goldshteyn
@Jon, I'm curious ... in what ways do you think the straight-forward `foreach`version of your code becomes "pretty icky"? It seems fairly simple to me.
corvuscorax
@corvuscorax: You have to check whether `current` is null inside the loop when you spot a change, and then again at the end just in case the list's empty. It just doesn't feel right.
Jon Skeet
@Michael: That's right. The lambdas are solely there to provide the keys and values, and they don't get any extra information.
Jon Skeet
@Downvoter: Care to give a reason?
Jon Skeet
+1  A: 

If this is actual production code, I'd go with Timwi's response.

If this is indeed homework and you're expected to write your own implementation, it shouldn't be too tough. Here are just a couple of hints to point you in the right direction:

  1. Dictionary<TKey, TValue> has a ContainsKey method.
  2. The IDictionary<TKey, TValue> interface's this[TKey] property is settable; i.e., you can do dictionary[key] = 1 (which means you can also do dictionary[key] += 1).

From those clues I think you should be able to figure out how to do it "by hand."

Dan Tao
+1  A: 

If you are looking for a particularly efficient (fast) solution, then GroupBy is probably too slow for you. You could use a loop:

var strings = new string[] { "abc", "def", "def", "ghi", "ghi", "ghi", "klm" };
var stringToNumOccurrences = new Dictionary<string, int>();
foreach (var str in strings)
{
    if (stringToNumOccurrences.ContainsKey(str))
        stringToNumOccurrences[str]++;
    else
        stringToNumOccurrences[str] = 1;
}
return stringToNumOccurrences;
Timwi
@Timwi: That's still doing a lot of dictionary lookups. Have a look at my answer for one which doesn't :) You could make this a bit more efficient using `TryGetValue` - exactly one lookup per iteration (and one write) rather than potentially two.
Jon Skeet
@Jon: I was actually writing an answer pretty much exactly like yours, but then I decided against it... maybe I should have posted it...
Timwi
+1  A: 

This is a foreach version like the one that Jon mentions that he finds "pretty icky" in his answer. I'm putting it in here, so there's something concrete to talk about.

I must admit that I find it simpler than Jon's version and can't really see what's icky about it. Jon? Anyone?

static Dictionary<string, int> CountOrderedSequence(IEnumerable<string> source)
{
    var result = new Dictionary<string, int>();
    string prev = null;
    int count = 0;
    foreach (var s in source)
    {
        if (prev != s && count > 0)
        {
            result.Add(prev, count);
            count = 0;
        }
        prev = s;
        ++count;
    }
    if (count > 0)
    { 
        result.Add(prev, count);
    }
    return result;
}

Updated to add a necessary check for empty source - I still think it's simpler than Jon's :-)

corvuscorax
The fact that you need to basically check whether you've seen any elements *twice* feels nasty to me, still. It doesn't express what you would *think* about when doing it manually - which would be to get the first element unconditionally, and go from there looking for new values, only *then* comparing each with the previous.
Jon Skeet