views:

107

answers:

2

I've read at many places that .net Threadpool is meant for short time span tasks (may be not more than 3secs). In all these mentioning I've not found a concrete reason why it should be not be used.

Even some people said that it leads to nasty results if we use for long time tasks and also leads to deadlocks.

Can somebody explain it in plain english with technical reason why we should not use thread pool for long time span tasks?

To be specific, I would even like to give a scenario and want to to know why ThreadPool should not be used in this scenario with proper reasons behind it.

Scenario: I need to process some thousands of user's data. User's processing data is retrieved from a local database and using that information I need to connect to an API hosted on some other location and the response from API will be stored in the local database after processing it.

If someone can explain me pitfalls in this scenario if I use ThreadPool with thread limit of 20? Processing time of each user may range from 3 sec to 1 min (or more).

+9  A: 

The point of the threadpool is to avoid the situation where the time spent creating the thread is longer than the time spent using it. By reusing existing threads, we get to avoid that overhead.

The downside is that the threadpool is a shared resource: if you're using a thread, something else can't. So if you have lots of long-running tasks, you could end up with thread-pool starvation, possibly even leading to deadlock.

Don't forget that your application's code may not be the only code using the thread pool... the system code uses it a lot too.

It sounds like you might want to have your own producer/consumer queue, with a small number of threads processing it. Alternatively, if you could talk to your other service using an asynchronous API, you may find that each bit of processing on your computer would be short-lived.

Jon Skeet
@Jon Skeet: I'm bit confused with your statement 'system code uses it a lot too'. Till now, I have an understanding that each process in a system will have a ThreadPool associated with it? i.e. each application will have its own ThreadPool for its process. Am I wrong in this?
JPReddy
This may help http://stackoverflow.com/questions/2675477/thread-vs-threadpool-net-2-0/2675508#2675508
Xander
+1  A: 

It is related to the way the threadpool scheduler works. It tries hard to ensure that it won't release more waiting threads than you have CPU cores. Which is a good idea, running more threads than cores is wasteful as Windows spends time switching context between threads. Making the overall time needed to complete the jobs longer.

As soon as a TP thread completes, another one is allowed to run. Two times per second, the TP scheduler steps in when the running threads do not complete. It cannot tell why these threads are taking so much time to get their job done. Half a second is a lot of CPU cycles, a cool billion or so. It therefore assumes that the threads are blocking, waiting for some kind of I/O to complete. Like a dbase query, a disk read, a socket connection attempt, stuff like that.

And it allows another thread to run. You've now got more threads then you have cores. Which isn't really a problem if those original threads are indeed blocking, they're not consuming any CPU cycles.

You can see where this leads: if your thread runs for 3 seconds then its creating a bit of a logjam. It delays, but won't block, other TP threads that are waiting to run. If your thread needs to spend so much time because it is constantly blocking then you are better off creating a regular Thread. And if you really care that the thread does not get delayed by the TP scheduler then you should use a Thread as well.

The TP scheduler was tinkered with in .NET 4.0 btw, what I wrote is really only true for earlier releases. The basics are still there, it just uses a smarter scheduling algorithm. Based on a feedback, dynamically scheduling by measuring throughput. This really only matters if you have a lot of TP threads going.

Hans Passant