views:

1057

answers:

4

I am having trouble counting the unique values in an array, and I need to do so without rearranging the array elements.

How can I accomplish this?

+11  A: 

If you have .NET 3.5 you can easily achieve this with LINQ via:

int numberOfElements = myArray.Distinct().Count();

Non LINQ:

List<int> uniqueValues = new List<int>();
for(int i = 0; i < myArray.Length; ++i)
{
    if(!uniqueValues.Contains(myArray[i]))
        uniqueValues.Add(myArray[i]);
}
int numberOfElements = uniqueValues.Count;
Quintin Robinson
If this is a homework question, then the answer isn't likel to get him many points but still a good answer in terms of linq.
Andrew Robinson
@Andrew Added a non LINQ homeworkish example.
Quintin Robinson
The non-linq example is really bad, but let's let Rich B come up with a better solution if it is indeed a homework question. :)(HINT: How would you avoid having to iterate over the entire array for each item?)
Arafangion
Let Jarus come up with a better solution, rather. (Why doesn't SO allow one to edit comments!?)
Arafangion
You can delete comments and re-write. That's what I do.
Devin Jeanpierre
You can delete and then re add. Guessing that editing could obscure the flow of the conversation.
Andrew Robinson
@Arafangion Yeah, generally have to delete and re-state. I knew what you meant though and that is why I post non optimized solutions for homework related questions, so the poster can think and exercise.
Quintin Robinson
@Ara: I didn't ask the question.
Geoffrey Chetwood
You should probably use some kind of hash set instead of a list to avoid O(n^2) performance.
erikkallen
+6  A: 

This is a far more efficient non LINQ implementation.

        var array = new int[] { 1, 2, 3, 3, 3, 4 };
        // .Net 3.0 - use Dictionary<int, bool> 
        // .Net 1.1 - use Hashtable 
        var set = new HashSet<int>();
        foreach (var item in array) {
            if (!set.Contains(item)) set.Add(item);
        }
        Console.WriteLine("There are {0} distinct values. ", set.Count);
Sam Saffron
Why <int, object> instead of <int, bool>?
sharptooth
Performance wise they should both be identical, will clean it up to use HashSet so this demo code looks less ugly
Sam Saffron
The dictionary contains should be far faster on large arrays than the List contains.
Kevin Gale
Its not a List, its a HashSet ....
Sam Saffron
You don't need to test whether the HashSet contains a value. If it contains, Add() does nothing. Maybe the compiler knows to optimize but you will get a slight performance improvement otherwise.
Chetan Sastry
It turns out that calling .Add for an item that is already there is slower than calling .Contains ... so it all depends on your input array
Sam Saffron
A quick benchmark shows it's true! I'm curious to know why.
Chetan Sastry
.Add() probably updates the set regardless of whether it contains it or not, since the result will be the same whether it checks for it or just adds it.
dreamlax
.Contains takes O(n) time, so this solution will take O(n2) time
Learner
A: 

Should only the distinct values be counted or should each number in the array be counted (e.g. "number 5 is contained 3 times")?

The second requirement can be fulfilled with the starting steps of the counting sort algorithm.
It would be something like this:

  • build a set where the index/key is the element to be counted
  • a key is connected to a variable which holds the number of occurences of the key element
  • iterate the array
    • increment value of key(array[index])

Regards

+1  A: 

O(n) running time max_value memory usage

boolean[] data = new boolean[maxValue];
for (int n : list) {
   if (data[n]) counter++
   else data[n] = true;
}