tags:

views:

1041

answers:

8

As I've been incorporating the Linq mindset, I have been more and more inclined to pass around collections via the IEnumerable<T> generic type which seems to form the basis of most Linq operations.

However I wonder, with the late evaluation of the IEnumerable<T> generic type if that is a good idea. Does it make more sense to use the T[] generic type? IList<T>? Or something else?

Edit: The comments below are quite interesting. One thing that has not gotten addressed though seems to be the issue of thread safety. If, for example, you take an IEnumerable<T> argument to a method and it gets enumerated in a different thread, then when that thread attempts to access it the results might be different than those that were meant to be passed in. Worse still, attempting to enumerate an IEnumerable<T> twice - I believe throws an exception. Shouldn't we be striving to make our methods thread safe?

+1  A: 

Output types should be as specific as possible, input types should be as loose as possible.

So return T[], but take IEnumerable<T> as input.

You should return what makes sense for the method. If it has to do extra work to convert to another type, think about just dropping that and returning whatever you have internally, as long as you don't return references to internal data structures that shouldn't be modified outside of your own code.

Lasse V. Karlsen
Well what I'm worried about is that the collection might get enumerated by a separated thread causing the method to fail. Shouldn't we strive to make all methods as thread safe as possible?
George Mauer
No, we shouldn't strive to make methods thread safe, unless we need to use them from another thread!
Orion Edwards
First of all, enumerating a single collection on two threads should not cause any problems, as enumerating a collection should be side-effect-free for the collection. Secondly, in such a case, don't return a modifiable collection, return a copy, or a thread-safe version.
Lasse V. Karlsen
+9  A: 

For the most part, you shouldn't be passing arrays (T[]) on public interfaces. See Eric Lippert's blog post "Arrays considered somewhat harmful" for details.

The exception is methods that take a params array, as it has to be an array. (Hopefully a future version will allow "params IList<s> foo" in method signatures.)

For internal members, do whatever you like; you have control over both sides of the interface.

Jay Bazuzi
Personally, I would prefer params IEnumerable<T>.
JaredPar
+1  A: 

I would choose between IList and IEnumerable and wouldn't consider array at all.

Note that IEnumerable does not have Count or Length property except as an extension method, while array and IList do. So, I would base my decision on that:

If return value has known number of elements -> IList or array (if you must) otherwise IEnumberable

bh213
I wish there was a IReadOnlyList that both Arrays and List<t> implemented.
Ian Ringrose
+3  A: 

"Shouldn't we be striving to make our methods thread safe?"

We should be deliberate about the abilities of our code.

Just because a method isn't threadsafe doesn't mean it's a failure, just that you need to know that fact before you try to use it in a multi-threaded program. Do you strive to make Main() threadsafe? Does thread safety matter in a single-threaded program?

Anyway, I don't think it really makes sense to say "Foo is threadsafe", only that "Foo has the following characteristics that are important in a multi-threaded context."

"attempting to enumerate an IEnumerable twice - I believe throws an exception."

That'd be bad. Luckily you can test it, instead of expanding FUD.

I think what you're asking is "shouldn't I return a copy of my collection, instead of a reference to an internal member collection, so that callers won't modify my data"? And the answer is an unqualified "maybe". There are a lot of ways to approach this problem.

Jay Bazuzi
+13  A: 

I went through a phase of passing around T[], and to cut a long story short, it's a pain in the backside. IEnumerable<T> is much better

However I wonder, with the late evaluation of the IEnumerable generic type if that is a good idea. Does it make more sense to use the T[] generic type? IList? Or something else

Late evaluation is precisely why IEnumerable is so good. Here's an example workflow:

IEnumerable<string> files = FindFileNames();
IEnumerable<string> matched = files.Where( f => f.EndsWith(".txt") );
IEnumerable<string> contents = matched.Select( f => File.ReadAllText(f) );
bool foundContents = contents.Any( s => s.Contains("orion") );

For the impatient, this gets a list of filenames, filters out .txt files, then sets the foundContents to true if any of the text files contain the word orion.

If you write the code using IEnumerable as above, you will only load each file one by one as you need them. Your memory usage will be quite low, and if you match on the first file, you prevent the need to look at any subsequent files. It's great.

If you wrote this exact same code using arrays, you'd end up loading all the file contents up front, and only then (if you have any RAM left) would any of them be scanned. Hopefully this gets the point across about why lazy lists are so good.

One thing that has not gotten addressed though seems to be the issue of thread safety. If, for example, you take an IEnumerable<T> argument to a method and it gets enumerated in a different thread, then when that thread attempts to access it the results might be different than those that were meant to be passed in. Worse still, attempting to enumerate an IEnumerable<T> twice - I believe throws an exception. Shouldn't we be striving to make our methods thread safe?

Thread safety is a giant red herring here.

If you used an array rather than an enumerable, it looks like it should be safer, but it's not. Most of the time when people return arrays of objects, they create a new array, and then put the old objects in it. If you return that array, then those original objects can then be modified, and you end up with precisely the kind of threading problems you're trying to avoid.

A partial solution is to not return an array of the original objects, but an array of new or cloned objects, so other threads can't access the original ones. This is useful, however there's no reason an IEnumerable solution can't also do this. One is no more threadsafe than the other.

Orion Edwards
+1 for being an useful post to me. The Where/Select/Any/Contains used in your code example is LINQ right?
TomWij
Yep. You could write it using the 'sql-like' syntax, but it does the same thing
Orion Edwards
Awesome answer, thanks for your input Orion!
George Mauer
A: 

Implementing a interpreted programming language written in C#, I also needed to choose between the return value of type object[] and List for Linq-like functions as map and filter. In the end I choose the lazy variant of the Lisp-like list. There is a list constructor that has IEnumerable argument: z = new LazyList(...). Whenever the program refers to one of the properties of LazyList (head, tail or isempty), the enumerable is evaluated for just one step (head and isempty become definite, tail becomes another lazy list). The advantage of a lazylist over an IEnumerable is that the former allows recursive algorithms and does not (have to) suffer from multiple evaluations.

+1  A: 

However I wonder, with the late evaluation of the IEnumerable generic type if that is a good idea. Does it make more sense to use the T[] generic type? IList? Or something else?

You need to be careful about the terms you are using. Be aware of the difference between the type of a reference and the type of an instance.

IEnumerable of T only says: "I may iterate over this collection by calling GetEnumerator. When I do that, each element can be refered to as a T." IEnumerable of T does not say anything about lazy evaluation.

Consider these three declarations:

//someInts will be evaluated lazily
IEnumerable<int> someInts = Enumerable.Range(0, 100);

//oddInts will be evaluated lazily
IEnumerable<int> oddInts = someInts.Where(i => i % 2 == 1);

//evenInts will be evaluated eagerly
IEnumerable<int> evenInts = someInts.Except(oddInts).ToList();

Even though the reference type of all three are IEnumerable of int, only evenInts is eagerly evaluated (because ToList enumerates its target and returns a List instance).

As for the question at hand, what I typically do is to treat my callers as expecting an eagerly evaluated Enumerable, so I do this:

public IEnumerable<int> GetInts()
{
  ...
  return someInts.ToList()
}
David B
A: 

You can always do this for thread safety ,it will probably perform better in a lot of instances as there is no locking of the collection ( its a copy) . Note this means you do the LINQ or foreach not on the collection but on a member.

  public IEnumerable<ISubscription> this[string topic] {
            get  { 
                rwlock.EnterReadLock();
                try {
                    return subscriptionsByTopic[GetTopic(topic)].ToArray<ISubscription>();              
                    //thread safe
                }
                finally {
                    rwlock.ExitReadLock();
                }
            }
        }

Also dont use IEnumerable for SOA/multi tier applications as you cant serialize interfaces ( without introducing lots of pain) .WCF works better with List.