views:

122

answers:

1

We have a fairly simple program that's used for creating backups. I'm attempting to parallelize it but am getting an OutOfMemoryException within an AggregateException. Some of the source folders are quite large, and the program doesn't crash for about 40 minutes after it starts. I don't know where to start looking so the below code is a near exact dump of all code the code sans directory structure and Exception logging code. Any advice as to where to start looking?

using System;
using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;

namespace SelfBackup
{
class Program
{

static readonly string[] saSrc = { 
    "\\src\\dir1\\",
    //...
    "\\src\\dirN\\", //this folder is over 6 GB
};
static readonly string[] saDest = { 
    "\\dest\\dir1\\",
    //...
    "\\dest\\dirN\\",
};

static void Main(string[] args)
{
Parallel.For(0, saDest.Length, i =>
{
    try
    {
        if (Directory.Exists(sDest))
        {
            //Delete directory first so old stuff gets cleaned up
            Directory.Delete(sDest, true);
        }

        //recursive function 
        clsCopyDirectory.copyDirectory(saSrc[i], sDest);
    }
    catch (Exception e)
    {
        //standard error logging
        CL.EmailError();
    }
});
}
}

///////////////////////////////////////
using System.IO;
using System.Threading.Tasks;

namespace SelfBackup
{
static class clsCopyDirectory
{
    static public void copyDirectory(string Src, string Dst)
    {
        Directory.CreateDirectory(Dst);

        /* Copy all the files in the folder
           If and when .NET 4.0 is installed, change 
           Directory.GetFiles to Directory.Enumerate files for 
           slightly better performance.*/
        Parallel.ForEach<string>(Directory.GetFiles(Src), file =>
        {
            /* An exception thrown here may be arbitrarily deep into 
               this recursive function there's also a good chance that
               if one copy fails here, so too will other files in the 
               same directory, so we don't want to spam out hundreds of 
               error e-mails but we don't want to abort all together. 
               Instead, the best solution is probably to throw back up 
               to the original caller of copy directory an move on to 
               the next Src/Dst pair by not catching any possible
               exception here.*/
            File.Copy(file, //src
                      Path.Combine(Dst, Path.GetFileName(file)), //dest
                      true);//bool overwrite
        });

        //Call this function again for every directory in the folder.
        Parallel.ForEach(Directory.GetDirectories(Src), dir =>
        {
            copyDirectory(dir, Path.Combine(Dst, Path.GetFileName(dir)));
        });
    }
}

The Threads debug window shows 417 Worker threads at the time of the exception.

EDIT: The copying is from one server to another. I'm now trying to run the code with the last Paralell.ForEach changed to a regular foreach.

+1  A: 

Making a few guesses here as I haven't yet had feedback from the comment to your question.

I am guessing that the large amount of worker threads is happening here as actions (an action being the unit of work carried out on the parallel foreach) are taking longer than a specified amount of time, so the underlying ThreadPool is growing the number of threads. This will happen as the ThreadPool follows an algorithm of growing the pool so that new tasks are not blocked by existing long running tasks e.g. if all my current threads have been busy for half a second, I'll start adding more threads to the pool. However, you are going to get into trouble if all tasks are long-running and new tasks that you add are going to make existing tasks run even longer. This is why you are probably seeing a large number of worker threads - possibly because of disk thrashing or slow network IO (if networked drives are involved).

I am also guessing that files are being copied from one disk to another, or they are being copied from one location to another on the same disk. In this case, adding threads to the problem is not going to help out much. The source and destination disks only have one set of heads, so trying to make them do multiple things at once is likely to actually slow things down:

  • The disk heads will be lurching all over the place.
  • Your disk\OS caches may be frequently invalidated.

This may not be a great problem for parallelization.

Update

In answer to your comment, if you are getting a speed-up using multiple threads on smaller datasets, then you could experiment with lowering the maximum number of threads used in your parallel foreach, e.g.

ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 2 };

Parallel.ForEach(Directory.GetFiles(Src), options, file =>
{
    //Do stuff
});

But please do bear in mind that disk thrashing may negate any benefits from parallelization in the general case. Play about with it and measure your results.

chibacity
Thanks for the insight @chibacity. I ran some tests on the smaller directories (network drives) and they seems to speed things up from 30 seconds to ~5-6 seconds. I was hoping that when I set it on the huge 6GB dir, I would see even half that speed up. Your logic makes sense though.
Martin Neal
There might be a bit more mileage in the approach, although disk thrashing may negate any benefits of parallelizm in the general case. I have added an update to my answer.
chibacity