views:

465

answers:

4

On a LINQ-result you like this:

var result = from x in Items select x;
List<T> list = result.ToList<T>();

However, the ToList<T> is Really Slow, does it make the list mutable and therefore the conversion is slow?

In most cases I can manage to just have my IEnumerable or as Paralell.DistinctQuery but now I want to bind the items to a DataGridView, so therefore I need to as something else than IEnumerable, suggestions on how I will gain performance on ToList or any replacement?

On 10 milion records in the IEnumerable, the .ToList<T> takes about 6 seconds..

+1  A: 

I think it's because of memory reallocations: ToList cannot know the size of the collection beforehand, so that it could allocate enough storage to keep all items. Therefore, it has to reallocate the List<T> as it grows.

If you can estimate the size of your resultset, it'll be much faster to preallocate enough elements using List<T>(int) constructor overload, and then manually add items to it.

Anton Gogolev
That might be why Count() takes a lot of time aswell.. I know the absolutly max size it can be though. Size Items is another IEnumerable or List<T>
Filip Ekberg
@Mark It has to copy elements from original array as well.
Anton Gogolev
It did not make it faster.
Filip Ekberg
-1. No, that's not the main reason. Filling a list with ten million items takes about 0.3 seconds. Setting the capacity will cut that down to 0.1 sedonds, so that will bring it from 6 seconds to 5.8 seconds.
Guffa
@Anton: Oops, sorry yes never mind, you're right. But this isn't the full reason.
Mark Byers
+4  A: 

It's because LINQ likes to be lazy and do as little work as possible. This line:

var result = from x in Items select x;

despite your choice of name, isn't actually a result, it's just a query object. It doesn't fetch any data.

List<T> list = result.ToList<T>();

Now you've actually requested the result, hence it must fetch the data from the source and make a copy of it. ToList guarantees that a copy is made.

With that in mind, it's hardly surprising that the second line is much slower than the first.

Mark Byers
Are you sure this has anything to do with database access?
Anton Gogolev
It's not linq to sql.
Filip Ekberg
So nothing in the query is executed until it's acutally used?
Filip Ekberg
OK, changed 'database' to the more generic 'data source'. Same logic applies though.
Mark Byers
@Filip: yep, the query just gives you a way to fetch data. It doesn't actually fetch anything until you need the result, which you do if you convert to a list.
Mark Byers
Ah.. I see. Is there a way to speed it up then?
Filip Ekberg
@Filip: Do you really need a list containing a million elements? What are you trying to do?
Mark Byers
In this case it's just for testing purpose.
Filip Ekberg
+1  A: 

No, it's not creating the list that takes time, it's fetching the data that takes time.

Your first code line doesn't actually fetch the data, it only sets up an IEnumerable that is capable of fetching the data. It's when you call the ToList method that it will actually get all the data, and that is why all the execution time is in the second line.

You should also consider if having ten million lines in a grid is useful at all. No user is ever going to look through all the lines, so there isn't really any point in getting them all. Perhaps you should offer a way to filter the result before getting any data at all.

Guffa
+6  A: 

.ToList() is slow in comparison to what?

If you are comparing

var result = from x in Items select x;
List<T> list = result.ToList<T>();

to

var result = from x in Items select x;

you should note that since the query is evaluated lazily, the first line doesn't do much at all. It doesn't retrieve any records. Deferred execution makes this comparison completely unfair.

Mehrdad Afshari
So there's no way to improve the performance?
Filip Ekberg
@Filip: The amount of work to retrieve a large number of items is very high. To improve performance significantly, you'll have to change your high level approach to the problem so that you *don't need to retrieve that amount of data in the first place*.
Mehrdad Afshari
Ok thanks a lot for pointing out that it's not executed untill i use the result. Helped me a lot to improve performance on other places. :)
Filip Ekberg