views:

1559

answers:

6

I often run into the case where I want to eval a query right where I declare it. This is usually because I need to iterate over it multiple times and it is expensive to compute. For example:

string raw = "...";
var lines = (from l in raw.Split('\n')
             let ll = l.Trim()
             where !string.IsNullOrEmpty(ll)
             select ll).ToList();

This works fine. But if I am not going to modify the result, then I might as well call ToArray() instead of ToList().

I wonder however whether ToArray() is implemented by first calling ToList() and is therefore less memory efficient than just calling ToList().

Am I crazy? Should I just call ToArray() - safe and secure in the knowledge that the memory won't be allocated twice?

+25  A: 

The performance difference will be insignificant, since List<T> is implemented as a dynamically sized array. Calling either ToArray() (which uses an internal Buffer<T> class to grow the array) or ToList() (which calls the List<T>(IEnumerable<T>) constructor) will end up being a matter of putting them into an array and growing the array until it fits them all.

If you desire concrete confirmation of this fact, check out the implementation of the methods in question in Reflector -- you'll see they boil down to almost identical code.

mquander
In Entity-Framework there is an option for eager reloading using EntityCollection.CreateSourceQuery, where sometimes I am actually discarding the results, then I think it's more proper to use ToArray.
Shimmy
An interesting fact that I came across is that for correlated queries caused by using a group defined through a group join in your projection causes Linq to SQL to add another sub-query to retrieve the count for that group. I'm assuming that this means in these cases the size of the collection will be known before the items are retrieved and thus an exact sized array could be created directly which would save on processing and memory resources while materializing the results.
jpierson
If the Count is known in advance, the performance is identical.However, if the Count isn't known in advance, the only difference between `ToArray()` and `ToList()` is that the former has to trim the excess, which involves copying the entire array, whereas the latter doesn't trim the excess, but uses an average of 25% more memory. This will only have implications if the data type is a large `struct`. Just food for thought.
Scott Rippey
+4  A: 

I agree with mquander that the performance difference is not going to be significant. I'd go for ToList() in most cases, for extra comfort.

Not strictly part of your question, but for other readers: if you're just looping through the result once, then you don't have to do either ToList() or ToArray(). The IQueryable will calculate the result on demand, which will then be enumerated in your foreach.

Thorarin
+1 for easiest solution, and if the developer want it in a list later just do something like List<string> mylist = new List<string>(lines);
Bob The Janitor
-1 for poor reading comprehension: The question says "This is usually because I need to iterate over it multiple times and it is expensive to compute."
mquander
It wasn't so much the comprehension as the fact that I subconsciously completely skipped the first paragraph. Probably a side effect of banner blindness :P
Thorarin
Note that arrays and other in-memory collections aren't IQueryable (unless you call AsQueryable), though IEnumerable does have the same deferred "calculate on demand" behavior.
dahlbyk
In Entity-Framework there is an option for eager reloading using EntityCollection.CreateSourceQuery, where sometimes I am actually discarding the results, then I think it's more proper to use ToArray.
Shimmy
It would be good to note that if you don't call ToList or ToArray, evaluation of the query will be deferred, which could cause problems if your Entity Framework or Linq to SQL context is short-lived.
StriplingWarrior
@StriplingWarrior: That's what it already says, basically. I could have probably explained it all better... I was probably in a hurry or something. Not much point updating the answer if it's already outvoted this much :)
Thorarin
+1  A: 

If you ever want to find out what happens behind the curtains in .NET, i really recommend .NET Reflector

David Hedlund
+5  A: 

The memory will always be allocated twice - or something close to that. As you can not resize an array, both methods will use some sort of mechanism to gather the data in a growing collection. (Well, the List is a growing collection in itself.)

The List uses an array as internal storage, and doubles the capacity when needed. This means that by average 2/3 of the items has been reallocated at least once, half of those reallocated at least twice, half of those at least thrice, and so on. That means that each item has by average been reallocated 1.3 times, which is not very much overhead.

Remember also that if you are colleting strings, the collection itself only contains the references to the strings, the strings themselves aren't reallocated.

Guffa
+3  A: 

ToList() is usually preferred if you use it on IEnumerable (from ORM, for instance). If the length of sequence is not known at the beginning, ToArray() creates dynamic-length collection like List and then converts it to array, which takes extra time.

Veton
I've decided that readability trumps performance in this case. I now only use ToList when I expect to continue adding elements. In all other cases (most cases), I use ToArray. But thanks for the input!
Frank Krueger
+1  A: 
Scott Rippey
I did a test and found out something surprising. An array DOES implement IList<T>! Using Reflector to analyze System.Array only reveals an inheritance chain of IList, ICollection, IEnumerable but using run-time reflection I found out that string[] has an inheritance chain of IList, ICollection, IEnumerable, IList<string>, ICollection<string>, IEnumerable<string>. Therefore, I don't have a better answer than @mquander!
Scott Rippey