views:

205

answers:

7

I usually find myself doing something like:

string[] things = arrayReturningMethod();
int index = things.ToList<string>.FindIndex((s) => s.Equals("FOO"));
//do something with index
return things.Distinct(); //which returns an IEnumerable<string>

and I find all this mixup of types/interface a bit confusing and it tickles my potential performance problem antennae (which I ignore until proven right, of course).

Is this idiomatic and proper C# or is there a better alternative to avoid casting back and forth to access the proper methods to work with the data?

EDIT: The question is actually twofold:

  • When is it proper to use either the IEnumerable interface or an array or a list (or any other IEnumerable implementing type) directly (when accepting parameters)?

  • Should you freely move between IEnumerables (implementation unknown) and lists and IEnumerables and arrays and arrays and Lists or is that non idiomatic (there are better ways to do it)/ non performant (not typically relevant, but might be in some cases) / just plain ugly (unmaintable, unreadable)?

A: 

I try to avoid rapidly jumping between data types if it can be avoided.

It must be the case that each situation similar to that you described is sufficiently different so as to prevent a dogmatic rule about transforming your types; however, it is generally good practice to select a data structure that provides as best as possible the interface you need without having to copying elements needlessly to new data structures.

kbrimington
+7  A: 

A good rule of thumb is to always use IEnumerable (when declaring your variables/method parameters/method return types/properties/etc.) unless you have a good reason not to. By far the most type-compatible with other (especially extension) methods.

Kirk Woll
Any extension method that works with `IEnumerable<T>` will also always work with `List<T>` and arrays, since they implement it. (I agree to use this whenever possible, but the reasoning here is incorrect...)
Reed Copsey
Isn't FindIndex() a good reason? Maybe one could implement it on top of one of the extension methods for IEnumerable?
Vinko Vrsalovic
@Vinko: I wouldn't convert to a list for that - you can do it with Select directly, and ToList forces you to iterate through the entire collection and copy every element...
Reed Copsey
IEnumerable is not a datatype, its an interface. You cannot "use" IEnumerable, you can only use things that "implement" IEnumerable, of which there are quite a few (string arrays and Lists both being in that category)
Stargazer712
Not just compatible with others, but so powerful. You can chain it, hide it, lazy-load it, transform it - all hidden behind one of the most well-designed contracts in .NET :)
Rex M
If there could be a -1/2, I would do it, as this doesn't really address the issue. string[] and List are both IEnumerable, thus it doesn't even answer the question. If you could edit your answer, that would be the best option.
Stargazer712
@Stargazer, my understanding of the OP's question is that he wants to know what types to use/declare. If that is the question, the answer is IEnumerable<T>. However, I'm not entirely sure what the OP's question really is.
Kirk Woll
@Kirk, he's wondering what the differences are and when to use them. Saying, "always use IEnumerable" ignores the fact that you cannot declare an instance of IEnumerable. When you need to instantiate an object, string[] and List<> both implement IEnumerable, thus saying "Use IEnumerable" does nothing to distinguish between the two. Don't get me wrong--when *accepting* an object, always use IEnumerable, but this only addresses part of the question.
Stargazer712
I would amend your answer to say instead: Always use `IEnumerable<T>` for public parameters, return types, properties, etc., except when you need `.Count`, in which case use `IList<T>`. For local variables, just use whatever type you actually have.
Daniel Pryden
@Daniel Pryden - whats wrong with the `Count()` extension method ? :) ..and "when you need" is often impossible to know with return values, because you cannot always predict the needs of the caller.
Peter Lillevold
@Peter Lillevold: What's wrong with the `.Count()` extension method? It has O(n) performance on generic `IEnumerable<T>`, that's what's wrong. What I was trying to say was, when you're writing a function that needs an O(1) implementation of `.Count` and/or random element access, it's better to require an `IList<T>` than some specific implementation like `List<T>` or `T[]`. For a return value, I would say only return an `IList<T>` if (a) it's cheap to do so, and (b) there's a reasonable expectation that a caller would want to perform random access and/or O(1) `.Count`.
Daniel Pryden
+3  A: 

Well, you've got two apples and an orange that you are comparing.

The two apples are the array and the List.

  • An array in C# is a C-style array that has garbage collection built in. The upside of using them it that they have very little overhead, assuming you don't need to move things around. The bad thing is that they are not as efficient when you are adding things, removing things, and otherwise changing the array around, as memory gets shuffled around.

  • A List is a C# style dynamic array (similar to the vector<> class in C++). There is more overhead, but they are more efficient when you need to be moving things around a lot, as they will not try to keep the memory usage contiguous.

The best comparison I could give is saying that arrays are to Lists as strings are to StringBuilders.

The orange is 'IEnumerable'. This is not a datatype, but rather it is an interface. When a class implements the IEnumerable interface, it allows that object to be used in a foreach() loop.

When you return the list (as you did in your example), you were not converting the list to an IEnumerable. A list already is an IEnumerable object.

EDIT: When to convert between the two:

It depends on the application. There is very little that can be done with an array that cannot be done with a List, so I would generally recommend the List. Probably the best thing to do is to make a design decision that you are going to use one or the other, that way you don't have to switch between the two. If you rely on an external library, abstract it away to maintain consistent usage.

Hope this clears a little bit of the fog.

Stargazer712
Good point about the interface being a different beast (+1), although this really doesn't answer about when/what to cast.
Vinko Vrsalovic
+1  A: 

Looks to me like the problem is that you haven't bothered learning how to search an array. Hint: Array.IndexOf or Array.BinarySearch depending on whether the array is sorted.

You're right that converting to a list is a bad idea: it wastes space and time and makes the code less readable. Also, blindly upcasting to IEnumerable slows matters down and also completely prevents use of certain algorithms (such as binary search).

Ben Voigt
Great tip about the Array namespace, thanks
Vinko Vrsalovic
You do not switch to IEnumerable, it already *is* IEnumerable. Its called polymorphism...sigh
Stargazer712
@Vinko: `System.Array` isn't a namespace, it's a class. @Stargazer: I changed "switch" to "upcast". Does that make it clearer for you? There IS a performance and functionality cost to polymorphism.
Ben Voigt
@Ben: Great tip about the Array class, thanks.
Vinko Vrsalovic
Much better, thank you. It irks me off that people treat IEquivalent as being in the same boat as List. They are apples and oranges.
Stargazer712
By "switch" I had meant changing the formal type of parameters and return values, but "upcast" is a much more accurate and informative description of what happens when that change is made, so hopefully no one will be thinking in terms of conversion.
Ben Voigt
A: 

You're right to ignore the 'performance problem' antennae until you actually have a performance problem. Most performance problems come from doing too much I/O or too much locking or doing one of them wrong, and none of these apply to this question.

My general approach is:

  1. Use T[] for 'static' or 'snapshot'-style information. Use for things where calling .Add() wouldn't make sense anyway, and you don't need the extra methods List<T> gives you.
  2. Accept IEnumerable<T> if you don't really care what you're given and don't need a constant-time .Length/.Count.
  3. Only return IEnumerable<T> when you're doing simple manipulations of an input IEnumerable<T> or when you specifically want to make use of the yield syntax to do your work lazily.
  4. In all other cases, use List<T>. It's just too flexible.

Corollary to #4: don't be afraid of ToList(). ToList() is your friend. It forces the IEnumerable<T> to evaluate right then (useful for when you're stacking several where clauses). Don't go nuts with it, but feel free to call it once you've built up your full where clause before you do the foreach over it (or the like).

Of course, this is just a rough guideline. Just please try to follow the same pattern in the same codebase -- code styles that jump around make it harder for maintenance coders to get into your frame of mind.

Jonathan
You're talking only about web development. You'd be nuts to convert all the time in a game, or in some algorithmic work, for instance, or if you're processing large lists with elements in the millions.List is the most expensive of the three, and also unnecessarily specific. Use it only when you don't know how many elements you have when you *populate* the array. Don't accept Lists as arguments. If you need to acecss the argument's values arbitrarily, accept an IList instead; otherwise just accept an IEnumerable. Otherwise you're just asking for refractoring hell.
Rei Miyasaka
+6  A: 

In regards to performance...

  • Converting from List to T[] involves copying all the data from the original list to a newly allocated array.
  • Converting from T[] to List also involves copying all the data from the original list to a newly allocated List.
  • Converting from either List or T[] to IEnumerable involves casting, which is a few CPU cycles.
  • Converting from IEnumerable to List involves upcasting, which is also a few CPU cycles.
  • Converting from IEnumerable to T[] also involves upcasting.
  • You can't cast an IEnumerable to T[] or List unless it was a T[] or List respectively to begin with. You can use the ToArray or ToList functions, but those will also result in a copy being made.
  • Accessing all the values in order from start to end in a T[] will, in a straightforward loop, be optimized to use straightforward pointer arithmetic -- which makes it the fastest of them all.
  • Accessing all the values in order from start to end in a List involves a check on each iteration to make sure that you aren't accessing a value outside the array's bounds, and then the actual accessing of the array value.
  • Accessing all the values in an IEnumerable involves creating an enumerator object, calling the Next() function which increases the index pointer, and then calling the Current property which gives you the actual value and sticks it in the variable that you specified in your foreach statement. Generally, this isn't as bad as it sounds.
  • Accessing an arbitrary value in an IEnumerable involves starting at the beginning and calling Next() as many times as you need to get to that value. Generally, this is as bad as it sounds.

In regards to idioms...

In general, IEnumerable is useful for public properties, function parameters, and often for return values -- and only if you know that you're going to be using the values sequentially.

For instance, if you had a function PrintValues, if it was written as PrintValues(List<T> values), it would only be able to deal with List values, so the user would first have to convert, if for instance they were using a T[]. Likewise with if the function was PrintValues(T[] values). But if it was PrintValues(IEnumerable<T> values), it would be able to deal with Lists, T[]s, stacks, hashtables, dictionaries, strings, sets, etc -- any collection that implements IEnumerable, which is practically every collection.

In regards to internal use...

  • Use a List only if you're not sure how many items will need to be in it.
  • Use a T[] if you know how many items will need to be in it, but need to access the values in an arbitrary order.
  • Stick with the IEnumerable if that's what you've been given and you just need to use it sequentially. Many functions will return IEnumerables. If you do need to access values from an IEnumerable in an arbitrary order, use ToArray().

Also, note that casting is different from using ToArray() or ToList() -- the latter involves copying the values, which is indeed a performance and memory hit if you have a lot of elements. The former simply is to say that "A dog is an animal, so like any animal, it can eat" (downcast) or "This animal happens to be a dog, so it can bark" (upcast). Likewise, All Lists and T[]s are IEnumerables, but only some IEnumerables are Lists or T[]s.

Rei Miyasaka
+1, great comparison. Also in many cases - though maybe beyond the original question - it is an interesting option to use `IList<T>` instead of `IEnumerable<T>`, especially when data should be added and removed.
0xA3
A: 

When to use what?

I would suggest returning the most specific type, and taking in the most flexible type.

Like this:

public int[] DoSomething(IEnumerable<int> inputs)
{
    //...
}

public List<int> DoSomethingElse(IList<int> inputs)
{
    //...
}

That way you can call methods on List< T > for whatever you get back from the method in addition to treating it as an IEnumerable. On the inputs, use as flexible as possible, so you don't dictate the users of your method what kind of collection to create.

Arjan Einbu