views:

470

answers:

3

In fact, this is the same question as this post:

how-can-i-make-sure-my-linq-queries-execute-when-called-in-my-dal-not-in-a-delay

But since he didn't explain why he wanted it, the question seems to have been passed over a bit. Here's my similar-but-better-explained problem:

I have a handful of threads in two types (ignoring UI threads for a moment). There's a "data-gathering" thread type, and a "computation" thread type. The data gathering threads are slow. There's a quite a bit of data to be sifted through from a variety of places. The computation threads are comparatively fast. The design model up to this point is to send data-gathering threads off to find data, and when they're complete pass the data up for computation.

When I coded my data gathering in Linq I wound up hoisting some of that slowness back into my computation threads. There are now data elements that aren't getting resolved completely until they're used during computation -- and that's a problem.

I'd like to force Linq to finish its work at a given time (end of statement? end of method? "please finish up, dammit" method call) so that I know I'm not paying for it later on. Adding ".ToList()" to the end of the Linq is 1. awkward, and 2. feels like boxing something that's about to be unboxed in another thread momentarily anyway.

+3  A: 

You wouldn't be boxing anything - you'd be buffering the results.

Using ToList() is basically the way to go if you actually want the data. Unless you're ready to use the data immediately, it's got to be buffered somewhere, hasn't it? A list is just a convenient way to do that.

The alternative is to do the processing then and there as well - use the data as you produce it, eagerly. I didn't quite follow the different threads side of thing, so it's not clear to me whether that would help you, but those are basically the choices available to you as far as I can see.

This is actually somewhat explicit in your description:

The design model up to this point is to send data-gathering threads off to find data, and when they're complete pass the data up for computation.

Calling ToList() basically changes what you return from "a query which can fetch the data when asked to" to "the data itself, buffered in a list".

Jon Skeet
Or calling any method on the result, such as Count(), should do the same thing...
GalacticCowboy
That will force it to be evaluated, but the data will be lost again otherwise - if the original query is then passed back, it will all be evaluated again, presumably slowly.
Jon Skeet
@GalacticCowboy: Calling a method like Count will buffer the results, but they'll then be discarded and you'll just be left with the count and no data :(
LukeH
@Luke: There's no reason why Count() should buffer results. It will *compute* the results, but it can throw them away as quickly as they're produced. For instance, if you had a data source which returned every log line in a giant file, you could call Count() without running out of memory, but ToList() would require all the lines to be in memory at the same time.
Jon Skeet
@Jon: Good point. I was trying to emphasise that using Count would discard the data, but I should've said "might buffer..." rather than "will buffer...".
LukeH
The search terms Eager and Lazy seem to give me additional avenues to hunt down. Still looking.
clintp
@clintp: I strongly suspect you're still looking for something that can't be done, from what you've described here. You want the information to be available immediately when you use it from the computation thread, but you don't want to store it between the time that you fetch it in the data reading thread and the time that you process it. Without storing it, how do you expect to be able to get it again quickly?
Jon Skeet
Okay, I mixed apples and oranges. Eric's comment above clarified what my problem was: one of expectations. If ToList() is the easiest way to package up those results, then that's what I'll do.
clintp
+2  A: 

Can you explain more why .ToList is not acceptable? You mentioned boxing and unboxing but those are completely unrelated topics.

Part of forcing a LINQ query to complete on demand necessitates storing the results. Otherwise in order to see the results again, you'd have to repprocess the query. .ToList efficiently achieves this by storing the elements in a List<T>.

It's possible to store the elements in virtually any other collection style data structure with various trade offs that may suit your needs better.

JaredPar
Read carefully. It *feels* like boxing/unboxing. Call it "encapsulation", or "gift wrapping", or whatever.
clintp
@clintp, sorry, I feel like those are bad comparisons.
JaredPar
A: 

There is a LoadOptions property in the DataContext class that could help you fetch the data more eagerly.

Else you could use a few clever placed ToList() 's.

leppie