views:

632

answers:

6

Can you make file copying faster through multiple threading?

Edit: To clarify, suppose you were implementing CopyFile(src, tgt). It seems logical that under certain circumstances you could use multiple threads to make it go faster.

Edit Some more thoughts:

Naturally, it depends on the HW/storage in question.

If you're copying from one disk to another, for example, it's pretty clear that you can read/write at the same time using two threads, thus saving the performance cost of the fastest of the two (usually reading). But you don't really need multiple threads for reading/writing in parallel, just async-IO.

But if async-IO can really speed things up (up to 2x) when reading/writing from different disks, why isn't this the default implementation of CopyFile? (or is it?)

+4  A: 

If you are not careful you can make it slower. Disks are good at serialized access, if you have multiple threads the disk heads will be all over the place. Now if you are dealing with a high performance SAN maybe you have an improvement in performance, and the SAN will deal with optimizing the disk access.

Otávio Décio
@ocdecio: Also, don't forget about the network card, CPU, and RAM limitations. If copying multiple files at the same time on different threads, you will need to factor in those considerations as well.
JFV
@JFV - That's right. The only time I saw good use for multiple threads was to scan directories in parallell.
Otávio Décio
And don't forget that device drivers have the opportunity to reorder outstanding requests if it benefits the hardware (for example to order disk I/O request by minimizing head seek distance).
Michael Burr
+2  A: 

I would think not. There's so little for the CPU to do.

sblundy
True, but couldn't you benefit from reading/writing at the same time, for example? Or from reading and writing to different locations in the same (under some storage solutions)?
Assaf Lavie
But the CPU doesn't do all that much of the work for those things. The Disk IO does.
sblundy
@sblundy - but you have the potential for different hardware components to be kept busy simultaneously.
Michael Burr
@Burr, But ten you have the potential to slow down /all/ your hardware! Or at least have the operation take longer than it needs to.
strager
@strager - I'm not sure I follow what you're saying. If the I/O device is otherwise idle, there's usually no harm in putting it to use now as opposed to later (since it's going to be used at some point for the I/O request).
Michael Burr
+1  A: 

It depends, but generally no, your bottleneck is going to be disk IO and you can't make disk IO faster using multiple threads.

Even in the extremely rare cases this will work the thread synchronization code would have to be so complicated it wouldn't be worth it.

Nir
even if you're reading/writing for separate disks?
Assaf Lavie
+2  A: 

You can see a benefit particularly if the files are on different devices in which case the I/O can be very effectively overlapped.

However, there are also cases where where you could easily cause thrashing of the hardware, so I don't think it's an optimization that should be taken lightly.

As far as the additional question you added:

But if async-IO can really speed things up (up to 2x) when reading/writing from different disks, why isn't this the default implementation of CopyFile? (or is it?)

I don't know the internals of CopyFile(), but I wouldn't be surprised if they do not do it for a couple reasons:

  1. if they were to implement it using an additional thread (or threads) that might be a bit more intrusive to a process than is appropriate (especially if the process is single threaded to this point)
  2. if they were to try to implement it using asynchronous I/O with a single thread (as ChrisW indicated is a possibility), they might be as likely to cause thrashing problems as improve performance. It might not be easy to generically determine when you'll get a benefit as opposed to a detriment.

This is not to say it couldn't or shouldn't be done (or even that it isn't done - I don't know) - these are just a couple possible reasons why it might not be done.

Michael Burr
I'm pretty sure the OS can tell when, for example, it's copying from one disk to another. I contend that in most case when this is the case, async-copying will be much faster, and so I'm wondering if/why this isn't the default implementation.
Assaf Lavie
+1  A: 

If you were implementing CopyFile, then instead of using multiple threads (e.g. one thread for reading and another thread for writing) you could use a single thread which initiates asynchronous I/O (so that one thread can initiate/reinitiate read and write simultaneously), using completion ports or whatever.

For improved perfomance, it might be implemented entirely in the kernel.

ChrisW
This is true - but I'd guess that trying to handle simultaneous overlapped read and write operations in a single thread is as complex as using multiple threads. And if you want to use completion ports to handle the I/O completion you'll need multiple threads.
Michael Burr
No, it's like "kick off a read, kick off a write, block until either completes ... when one (read or write) completes then kick off an asynchrnous other one (which takes negligible time to initiate) and then go back to waiting for the next completion".
ChrisW
+2  A: 

Here's a blog post about file copy performance improvements in Vista SP1:

http://blogs.technet.com/markrussinovich/archive/2008/02/04/2826167.aspx

Doing high performance file copy is crazy and you have to take into account things like the cache behavior and network drivers limitations.

So always use the OS file copy function (under Windows it's FileCopyEx) and don't write your own.

Nir