views:

138

answers:

3

I find myself often with a situation where I need to perform an operation on a set of properties. The operation can be anything from checking if a particular property matches anything in the set to a single iteration of actions. Sometimes the set is dynamically generated when the function is called, some built with a simple LINQ statement, other times it is a hard-coded set that will always remain the same. But one constant always exists: the set only exists for one single operation and has no use before or after it.

My problem is, I have so many points in my application where this is necessary, but I appear to be very, very inconsistent in how I store these sets. Some of them are arrays, some are lists, and just now I've found a couple linked lists. Now, none of the operations I'm specifically concerned about have to care about indices, container size, order, or any other functionality that is bestowed by any of the individual container types. I picked resource efficiency because it's a better idea than flipping coins. I figured, since array size is configured and it's a very elementary container, that might be my best choice, but I figure it is a better idea to ask around. Alternatively, if there's a better choice not out of resource-efficiency but strictly as being a better choice for this kind of situation, that would be nice as well.

+5  A: 

With your acknowledgement that this is more about coding consistency rather than performance or efficiency, I think the general practice is to use a List<T>. Its actual backing store is an array, so you aren't really losing much (if anything noticable) to container overhead. Without more qualifications, I'm not sure that I can offer anything more than that.

Of course, if you truly don't care about the things that you list in your question, just type your variables as IEnumerable<T> and you're only dealing with the actual container when you're populating it; where you consume it will be entirely consistent.

Adam Robinson
I'd argue for plain arrays, but I do agree about using IEnumerable<T> for interfaces so that we can both be right.
Steven Sudit
@Steven: I chose `List<T>` simply because a) it uses arrays on the backend, so you aren't really losing anything, and b) it allows you to be consistent with sets that start out both with known and unknown sizes. A list also provides flexibility if you need to expose it later, since it's easily made read-only through the `AsReadOnly` function, whereas an array would require an entirely new type in order to be read-only.
Adam Robinson
@Adam: Good reasons. As a C++ refugee, I find List to be a suitable replacement for vector.
Steven Sudit
+2  A: 

There are two basic principles to be aware of regarding resource efficiency.

  • Runtime complexity
  • Memory overhead

You said that indices and order do not matter and that a frequent operation is matching. A Dictionary<T> (which is a hashtable) is an ideal candidate for this type of work. Lookups on the keys are very fast which would be beneficial in your matching operation. The disadvantage is that it will consume a little more memory than what would be strictly required. The usual load factor is around .8 so we are not talking about a huge increase or anything.

For your other operations you may find that an array or List<T> is a better option especially if you do not need to have the fast lookups. As long as you are not needing high performance on specialty operations (lookups, sorting, etc.) then it is hard to beat the general resource characteristics of array based containers.

Brian Gideon
Using a `HashSet` would be better, imo, since he doesn't need an associated value with the keys.
tzaman
If lookups and/or checking to see if an object exists in the collection would be much more frequent that modifying the list, then `Dictionary` and `Hashset` would be good candidates, but just for a general "bag", they introduce a small amount of complexity that's wasted.
Adam Robinson
@tzaman: Good point.
Brian Gideon
@Adam: Absolutely!
Brian Gideon
In summarizing both @Adam and Brian's answers, as well as some [extra reading](http://blogs.msdn.com/ericlippert/archive/2008/09/22/arrays-considered-somewhat-harmful.aspx), stick with List<T> for general bags. Since I'm going for consistency, I'm accepting Adam's answer, but the info about when HashSet might be a good is more than valuable. Thanks to you both!
ccomet
A: 

List is probably fine in general. It's easy to understand (in the literate programming sense) and reasonably efficient. The keyed collections (e.g. Dict, SortedList) will throw an exception if you add an entry with a duplicate key, though this may not be a problem for what you're working on now.

Only if you find that you're running into a CPU-time or memory-size problem should you look at improving the "efficiency", and then only after determining that this is the bottleneck.

No matter which approach you use, there will still be creation and deletion of the underlying objects (collection or iterator) that will eventually be garbage collected, if the application runs long enough.

Chris Pousset