views:

284

answers:

9

I have to work on some code that's using generic lists to store a collection of custom objects.

Then it does something like the following to check if a given object's in the collection and do something if so:

List<CustomObject> customObjects;
//fill up the list
List<CustomObject> anotherListofCustomObjects;
//fill it up

//...

foreach (CustomObject myCustomObject in customObjects)
{
   if (anotherListofCustomObjects.Contains(myCustomObject))
   {
      //do stuff
   }
}

Problem is is taking forever to process 7000 objects like that.

This is not my code - I am just trying to come up options to improve it - Looks to me it would be much faster to use a dictionary to get the stuff by key instead of looping through the whole collection like the above.

Suggestions?

A: 

Why don't you use a dictionary ?

Can you show how CustomObject is implemented ? Does CustomObject implement IEquatable ?

Y Low
it doesn't implement IEquatable
JohnIdol
+3  A: 

Well, you seem to have answered it yourself? If you need fast query against a set of data, then a dictionary may be better than a flat list (for largish data sizes, which yours is).

You could, for example, use the object as its own key -

Dictionary<CustomObject,CustomObject> ...

Note that the meaning of equality depends on the context. If you are passing in the original reference, then that is fine - ContainsKey would do the job. If you have a different but similar-for-the-purposes-of-equality object to compare to, then you'll need to implement your own GetHashCode(), Equals(), and ideally IEquatable<CustomObject>. Either in CustomObject itself, or in a custom IEqualityComparer<CustomObject>.

Marc Gravell
Using the object as the key to the object is nothing else then using the object itself to find itself like the list version in the original post.The Key of a dictionary should be something smaller, easier to process than the value item.
BeowulfOF
@BeowulfOF no, that's not the case. Using an object as the key is faster, because you can use the same object (from another list) to check whether it's in the dictionary.
Frans Bouma
@BeowulfOF - it is simply a reference. Sure, you *can* use a separate key, but it is not compulsory. The performance depends primarily on the Equals and GetHashCode complexity, regardless of whether it is an object key or a natural key.
Marc Gravell
+2  A: 

Indeed your code is O(n^2) currently, which will be slow. You can:

  • use dictionaries or KeyedCollections instead, this will make it O(nlog n)
  • if you can assure that the items are in the same order, you can rewrite the last loop to use just one index, and this would be O(n)
Grzenio
+9  A: 

Another way besides dictionaries is, if you're on .NET 3.5, to use Linq to objects and Intersect:

foreach(CustomObject c in customObjects.Intersect(anotherListOfCustomObjects))
{
    // do stuff.
}

According to reflector, it uses Hash-based sets to perform the intersection of the sequences.

Frans Bouma
I'd imagine the performance on this wouldn't be any better.
siz
Yes it is better, because it performs set lookups in hash-based Set<T> instances (which is an internal class).
Frans Bouma
The performance won't win with this, even would this be slower. Linq is mostly only for better understanding, but not for better performance.
BeowulfOF
@BeowulfOF: you checked the code? :) See my previous comment. Of course, if 'Intersect' was implemented using an O(n*m) algo like in the question, it would be the same, but fortunately it's not.
Frans Bouma
BTW, I've timed it with simple int Lists and it really is much faster.
lacop
thanks but I am using .NET 2.0!
JohnIdol
A: 

If you must maintain two separate lists, one of the Set types might be faster (using a Join operation). Some of the available libraries are

  1. IESI Collections
  2. PowerCollections
  3. C5
Anthony Mastrean
A: 

Just a minor addition to the other comments. If you need the other list of customers to be sorted you could use a SortedList.

Cristian Libardo
+1  A: 

You might also consider System.Collections.ObjectModel.KeyedCollection<TKey, TItem>.

To supplement this, I usually create my own IKeyable interface and a specific implementation of KeyedCollection that uses IKeyable for the required overload.

Joel Coehoorn
+1  A: 

Tests are your friend. The size of the collection determines witch Data Structure/algorithm you should use. I suggest you do some performance benchmarks on the following options:

  1. Your current solution
  2. Use a BinarySearch algorithm in your sorted List.
  3. Use a HashSet<CustomObject>.

Given the number of elements I suspect that the HashSet<CustomObject> is the way to go.

bruno conde
A: 

Hashset work great too.

new HashSet<CustomObject>().Join()
Brian Rudolph