views:

499

answers:

3

I have an application, which need to do some validation on the database end. Each process goes through database validation (search on the table that has around 6 million records), then goes through some Lucene Index searcher (the index is built off this 6 million record table). As these steps are disjoint for each line items being passed, I am thinking of utilizing multicore threading. (each of these lines take around 1/2 minute on a single thread).

What are my options with multicore in C#? Is there some good resources / third party library (I looked a bit at PowerThreading by Jeff Ritcher), some good tutorials.

I assume I need to do some thread pools in N core machines.

Currently, it takes around 40 secs to process 100 lines, looking to get this done to around 10 secs.

Thanks...

+1  A: 

Have you looked into F#?

It is designed from the ground up for parallelizing tasks.

Chris Ballance
I dont' have enough bandwith to use F# right now. I will look into it. But, is there some C# solution you can suggest?
bkhanal
Keep your threads decoupled from interdependencies as much as possible.
Chris Ballance
A: 

If you want a factor of four speed increase and have four cores, all you need to do is avoid dependencies among tasks, which sounds feasible. I think you'll find that you'll actually want to run more threads than cores because, at any given time, a number of threads will be blocked waiting for I/O. Therefore, I suggest that, whichever method you use, ensure that it's easy to benchmark it with different thread counts.

Steven Sudit
Can you point me to links that I can start from?
bkhanal
Well, the simplest thing is to use the ThreadPool, as per http://msdn.microsoft.com/en-us/library/ms973903.aspx.
Steven Sudit
+1  A: 

Simple threading should give you access to the multicore. You will have to play around with the size of the thread pool, as your tasks look to have a lot of IO as well.

Kathy Van Stone
when you mean simple threading you are talking about threading pool? I tried threading pool in double core machine and still get around same performance!
bkhanal
Remember what Chris said about decoupling interdependencies?
Steven Sudit
I admit I haven't tried checking whether multi-threads use multiple cores, but that is true for Java and specifically noted as not true for Python because of a global lock. Also synchronization and Amdahl's law (http://en.wikipedia.org/wiki/Amdahl%27s_law) will govern in any case.
Kathy Van Stone
Yes, multiple threads will be spread across all of your cores unless you explicitly prevent this. If the threads work independently then you should get a speed-up nearly proportional to the number of cores. If you're not getting that, something is wrong, most likely a global lock.
Steven Sudit
If the computation is independent and IO bound, you may be able to get greater speedup than the number of cores.
Kathy Van Stone