views:

264

answers:

2

Hi,

I'm trying to create, a fast and somewhat intelligent file copy algorithm (in c# but platform independent).

My goals:

  • I don't want to use any platform specific code (no pinvokes or anything)
  • I'd like to take advantage of multiple cores but this seems stupid since doing simultaneous reads/writes would seem slower right? (correct me please if I'm wrong)
  • I want to keep track of the copying progress so File.Copy is not an option

The code that I've come up with is nothing special and I'm looking into ways of speeding it up:

public bool Copy(string sourcePath, string destinationPath, ref long copiedSize, long totalSize, int fileNum, int fileCount, CopyProgressCallback progressCallback)
{
    FileStream source = File.OpenRead(sourcePath);
    FileStream dest = File.Open(destinationPath, FileMode.Create);

    int size = (int)(1024 * 256); // 256KB
    int read = 0;

    byte[] buffer = new byte[size];

    try
    {
        while ((read = source.Read(buffer, 0, size)) != 0)
        {
            dest.Write(buffer, 0, read);

            copiedSize += read;
            progressCallback(copiedSize, totalSize, fileNum, fileCount, j);
        }

        return true;
    }
    catch
    {
        // No I don't care about exception reporting.       
        return false;           
    }
    finally
    {
        source.Close();
        dest.Close();
    }
}

Things that I've tried and didn't work out:

  • Increasing buffer as I go along (loss of speed and caching problems with CD/DVD)
  • Tried 'CopyFileEx' - the pinvokes was slowing copying
  • Tried many different buffer sizes and 256KB seems the best solution
  • Tried to read as I write - slow down
  • Changed 'progressCallback' to update UI after 1 second (using Stopwatch class) - this has significantly improved speed

ANY suggestions are welcome - I'll be updating the code/stuff as I try out new stuff. Suggestions don' t have to be code - just ideas.

+1  A: 

Hi

Multiple cores aren't much use without multiple read/write heads which probably means multiple disks. Since your question is platform agnostic I feel free to suggest using a parallel I/O system and get all those cores doing their share of the work instead of idling.

If you limit yourself to a single disk with a single read/write arm and one head per surface you need to minimise movements of the arm. You probably want to read from a track on one surface and write to the same track on another surface. Or you might want to read, a sector from one surface and copy to another sector on the same track on the same surface.

However, all of this involves very low-level (well to me they look very low level) operations. The whole trend in general-purpose computing seems to be continually to give the programmer easy tools to use, at the cost of removing easy access to low level operations. The task you have set yourself is approximately this:

Trick C# into accessing the disk in the way I want it to, rather in the way it wants to.

Good luck with that :-)

Mark

PS Your mention of CD/DVD suggests, though you don't state it, that you are trying to make a fast copy from disk to CD/DVD. If so you might think about doing a disk-disk copy first, letting your copier get back to work, and putting the copying from the copy to the CD/DVD on another core.

High Performance Mark
multicores - agreesingle read/write - agreelow level - not necessarily -> changing the buffer size and limiting ui update helped A LOTtricking c# - agree ;)cd/dvd - the other way (cd/dvd -> disk)
argh
A: 

Your speeds can be affected by many things, file system's cluster size, file fragmentation, disk interface type (ide/sata/etc), other disk operations from other processes, what have you.

Each computer and operation parameters will have differences giving different results, one code change might increase speeds here, but could decrease speeds there.

Maybe have a default set of settings for files smaller than say 100mb, otherwise run a quick set of tests to pre-configure the settings. Run a read/write speed test with a set number of buffer sizes, detect if the source and destination paths are located on separate disks (if so, make the copy multi-threaded; one that reads, the other that writes). Only raise/callback with progress updates that are significant (progress that has changed by like +3%).

DanStory