views:

134

answers:

2

I am building a background processing engine which supports discarding both to-be-processed and is-being-processed items. This is for usage in a winforms application that will require heavy processing of some input elements, so I'm building a queue engine where I can enqueue workload items, and when they're processed, I get notified with the results.

The question is, this queue will almost always contain a lot of items to begin with, and I thought that instead of just dumping everything to the threadpool, I'd place only the first N items into the threadpool, and keep backfilling when they are processed. The reason I want to do this is that once I dump them into the threadpool, they will be processed, and even if they're tagged as discard, they will still take up queue time.

With the backfill implementation I've made, I can remove items from the queue if they become discarded, and only put them into the queue when it's their turn, so to speak.

So the question is, how would I go about calculating this number N, the number of items to place into and keep in the thread pool queue.

Issues I've considered:

  • I might want to enqueue 2 * number of processors, which I see is a typical number of items, to ensure all processors are working
  • However, if the actual processing of some items is super-fast (which can happen), then the queue in the threadpool is exhausted before my own class can backfill with more work, so perhaps I'd like a bigger number to avoid underutilizing the processors
  • Should I create some auto-adjust routine to calculate the optimal number based on the current time each item takes, so that if they are all super-fast, the number is much higher, and if processing takes a bit of time, it should stay low?

What do you think?

New: Ok, due to one of the answers, I'll explain a bit more. Every item put into the queue is keyed by something unique. If I dump another item into the queue with the same key as an existing item, that old item is considered "Discarded", and should be removed. If the item is being processed, a property on the workload item is set to true, a "IsDicarded" property, which the processing method is responsible for calling. If it detects a discarded item, it should quit early, returning no results.

Perhaps I should experiment a bit more, and try to just dump everything into the threadpool.

New question: Is there a limit to the number of items I can queue up? If not, then this would easily simplify my class a lot.

Note: When I say "lengthy processing", I mean in the order of 1-10 seconds. Is the threadpool even the best for this? I see notes all over the web about "the processing should be quick", but what "quick" is is never mentioned. Is quick in the order of milliseconds here?

+1  A: 

Is it possible you could simplify the approach by modifying your items to first check they are still required before they do any work? This would skirt the problem of limiting the number in the pool, since you can simply add them all and when each item gets processed it will exit if no longer needed.

The number of operations that can be queued to the thread pool is limited only by available memory; however, the thread pool limits the number of threads that can be active in the process simultaneously. By default, the limit is 250 worker threads per CPU and 1,000 I/O completion threads.

You can control the maximum number of threads by using the GetMaxThreads and SetMaxThreads methods.

http://msdn.microsoft.com/en-us/library/0ka9477y.aspx

Ed Guiness
Yeah, they do that, the items being sent into the delegate contains a "IsDicarded" method they need to call, not just before, but also during processing, if the processing is lengthy, to abort early. It might be that I've overcomplicated the matter, and should just keep track of existing items, they're keyed so that I can find them to discard them if replaced, and just dump them all into the threadpool. Perhaps I should experiment with that solution.
Lasse V. Karlsen
Sounds like it is certainly worth a try, especially if it means avoiding this other complexity in sizing and managing the pool by hand.
Ed Guiness
Do you know if there is a limit to the number of work items I can put into the thread pool?
Lasse V. Karlsen
This looks like an excellent time to do a branch for some experimentation :)
Lasse V. Karlsen
Ah, looks like I need to look at the smart thread pool that Rubens mention after all, I need prioritization support. For instance, when the form in question is opened, all the data is queued for processing, but if the user changes some specific columns in a grid, the results for those columns are re-queued for processing with a higher priority, so that eventually all data is processed, but whatever the user fiddles with will be processed first.
Lasse V. Karlsen
+1  A: 

Do you know Ami Bar's Smart Thread Pool?

Seems its implementation allows you to cancel an unprocessed item and dynamically increases threads as required, until a hard limit; I personally use 100 * Environment.ProcessorsCount

Rubens Farias
No, I didn't know about that one, I'll look into it, it looks promising. The best code is the code I don't have to write, but just works out of the box :)
Lasse V. Karlsen
@Lasse, 'nuff said
Rubens Farias