views:

74

answers:

2

The scenario is that there is lets say 1 TB of objects each of ten mb in database.

I have a function named MATCH() which has a query object, whose return type is double, and in this function I have mathematical calculations. I have a check that if the value of the result is in between 0 and 1 then i have:

double[ ] Result=new double[eg 1000]

  • How can i do this, as the system has 2 GB RAM - Performance.
  • Which section I should lock, use mutex or use thread pool? - Thread Safety
  • How many threads can I run simultaneously, specifically compared to a BackgroundWorker?

Please give me architecture of the program. (ED: I reckon just ignore this line.)

A: 

If this is an application that does not have a UI, use the ThreadPool. You can set the maximum number of threads to use, and since this seems like a specialized application, tinker with it until you have it just right.

ThreadPool examples here (MSDN).

Kyle Rozendo
"If this is an application that does not have a UI, use the ThreadPool." Why doesn't it should have UI inorder to use ThreadPool()?
claws
Its as opposed to using the BackgroundWorker, I should have made that more clear. The BackgroundWorker does not make sense to use in a non-UI environment (performance and control wise) versus the threadpool.
Kyle Rozendo
+1  A: 

Here are some things about threads that could help you.

In reality you never need more than one thread per cpu, more threads would just add more overhead on the scheduler. However, thread often block, like it would if you query data over a database, so it is not feasible to keep only one thread per cpu, you will probably need more to get the CPU usage to 100%.

That said, in your scenario, having more than one or two threads querying data over the same database won't help you much, because the database is the overhead. I would consider creating only one or two thread that simultaneously query data to the database, or better use the asynchronous pattern and use the Command.BeginExecute...() method and allow only a few simultaneous query in parallel. When the querying is done, you can now queue the processing you have to do on the data, this could be done on the .Net ThreadPool or in a custom thread pool containing only one thread per cpu if the processing of the data takes longer than querying it.

SelflessCoder