views:

355

answers:

7

When performing many disk operations, does multithreading help, hinder, or make no difference?

For example, when copying many files from one folder to another.

Clarification: I understand that when other operations are performed, concurrency will obviously make a difference. If the task was to open an image file, convert to another format, and then save, disk operations can be performed concurrently with the image manipulation. My question is when the only operations performed are disk operations, whether concurrently queuing and responding to disk operations is better.

A: 

No, it makes no sense. At some point, the operations have to be serialized (by the OS). On the other hand, since modern OS's have to cope with multiple processes anyway I doubt that there's an added overhead.

Konrad Rudolph
A: 

I would think it depends on a number of factors, like the kind of application you are running, the number of concurrent users, etc.

I am currently working on a project that has a high degree of linear (reading files from start to finish) operations. We use a NAS for storage, and were concerned about what happens if we run multiple threads. Our initial thought was that it would slow us down because it would increase head seeks. So we ran some tests and found out that the ideal number of threads is the same as the number of cores in the computer.

But your mileage may vary.

Robert Harvey
A: 

It can do, simply because whenever there is more work for a thread to do (identifying the next file to copy) the OS wakes it up, so threads are a simple way to hook into the OS scheduler and yet still write code in a traditional sequential way, instead of having to break it up into a state machine with callbacks.

This is mainly an assistance with clear programming rather than performance.

Daniel Earwicker
+2  A: 

That depends on your definition of "I/O bound" but generally multithreading has two effects:

  • Use multiple CPUs concurrently (which won't necessarily help if the bottleneck is the disk rather than the CPU[s])

  • Use a CPU (with a another thread) even while one thread is blocked (e.g. waiting for I/O completion)

I'm not sure that Konrad's answer is always right, however: as a counter-example, if "I/O bound" just means "one thread spends most of its time waiting for I/O completion instead of using the CPU", but does not mean that "we've hit the system I/O bandwidth limit", then IMO having multiple threads (or asynchronous I/O) might improve performance (by enabling more than one concurrent I/O operation).

ChrisW
+7  A: 

Most of the answers so far have had to do with the OS scheduler. However, there is a more important factor that I think would lead to your answer. Are you writing to a single physical disk, or multiple physical disks?

Even if you parallelize with multiple threads...IO to a single physical disk is intrinsically a serialized operation. Each thread would have to block, waiting for its chance to get access to the disk. In this case, multiple threads are probably useless...and may even lead to contention problems.

However, if you are writing multiple streams to multiple physical disks, processing them concurrently should give you a boost in performance. This is particularly true with managed disks, like RAID arrays, SAN devices, etc.

I don't think the issue has much to do with the OS scheduler as it has more to do with the physical aspects of the disk(s) your writing to.

jrista
A: 

I'd think it would hinder the operations... You only have one controller and one drive.

You could use a second thread to do the operation, and a main thread that shows an updated UI.

Osama ALASSIRY
A: 

I think it could worsen the performance, because the multiple threads will compete for the same resources.

You can test the impact of doing concurrent IO operations on the same device by copying a set of files from one place to another and measuring the time, then split the set in two parts and make the copies in parallel... the second option will be sensibly slower.

fortran