views:

174

answers:

4

hi i am new to threading. my boss gave me a scenario.we have a list of object which is 0f 70gb,we want to load it from a database.it takes time i want to utilize cpu(multi threading) in this scenario .i.e partializing data ,load first part, than second part,whensecond part is loaded mean while first part is processed, what i shuld do plese guide me.

+1  A: 

Take a look at these resources:

Konamiman
A: 

The easiest way to start with multithreading in c# is to use backgroundworkers. Take a look to it LINK

Jonathan
A: 

My experience of scenarios like this is the data retrieval always takes longer, and multi-threading only makes a difference in very CPU intensive tasks.

E.g. I retrieve 5000 accounts from the DB at a time with all of their related person information using FOR XML so that I get relational data I can deserialize. This takes approximately a minute, then I run complex segmentation which runs in less than a second. In this scenario multi-threading would not make a difference.

Another scenrio I had was where I was making a simple DB call thousands of times, and each was subject to network delays, whilst the SQL server was being stretched as the table was small and fully indexed so searching was instant. in this case it was actually quicker to spawn a few threads and cal the database mutliple times at once using these threads. Spawning 8 threads gave approximately a fourfold increase.

Look into the System.Threading namespace.

ck
+1  A: 

Do you have control over how the data is loaded into the database, and what the hardware looks like?

To load 70GB of data, you will be I/O bound at first. If the data lives in a single volume, trying to use multiple threads will just cause the disk heads to thrash as they seek back-and-forth across the drive.

That means your first step should be to maximize the performance of your disk subsystem. You can do that by doing things like:

  1. Limiting your disk partition size to the first third of the drive
  2. Putting as many spindles as you can into a single large RAID volume, up to the speed of your disk controller
  3. Using SSDs instead of rotating magnetic drives
  4. Using a high-speed disk controller
  5. Using multiple disk controllers
  6. Spreading your drives out among multiple controllers

Once you have that part done, the next step is to partition your data among as many disks and controllers as possible, while still allowing your log file to be on a volume by itself. If you can fill two entire controllers with fast RAID volumes, then divide your data among them. In some cases, it can help to use SQL Server's table partitioning mechanisms to help with the process and to force certain parts of the table to be on certain physical volumes.

After the partitioning is done, then you can build your app to have one thread read from each physically separate partition, to avoid disk thrashing.

Once you're past being I/O bound, then you can start thinking about ways to optimize the CPU side of things -- but it's pretty unusual to get to that point.

Depending on how much speed you need, this type of thing can get complex (and expensive) quickly....

In case it helps, I discuss a number of these infrastructure issues in detail in my book: Ultra-Fast ASP.NET.

RickNZ
now scenario is that,there is let say 1 TB of objects each of ten mb in database.i have a function named MATCH() which have a query object,its return type is double,in this function i have mathematical calculations.now i have a check that weather value of result isinbetween 0 and 1 then i have double[ ] Result=new double[eg 1000].how can i do this,as system has 2 gb ram,which section i should lock,or use mutex,or use thread pool.please give me architecture of programme.please inform me how many thread i can run simultaneously?and how much threads backgroundworker controlol can use
abrar
It sounds like you have two challenges: first, to read the data from the DB quickly, and second to process it with your MATCH function. I don't have nearly enough information to answer your question in detail; that's a Big Ask for a forum post. The short answer is that you would have one or more threads reading data (depending on the physical layout), and one or more threads processing the results, with locks or semaphores to coordinate the work. Good luck!
RickNZ