tags:

views:

240

answers:

5

Hi, I have a table of urls which I need to loop through, download each file, update the table and return the results. I want to run up to 10 downloads at a time so am thinking about using delegates as follows:

DataTable photos;
bool scanning = false,
    complete = false;
int rowCount = 0;

public delegate int downloadFileDelegate();

public void page_load(){

    photos = Database.getData...        

    downloadFileDelegate d = downloadFile;

    d.BeginInvoke(downloadFileComplete, d);
    d.BeginInvoke(downloadFileComplete, d);
    d.BeginInvoke(downloadFileComplete, d);
    d.BeginInvoke(downloadFileComplete, d);
    d.BeginInvoke(downloadFileComplete, d);

    while(!complete){}

    //handle results...

}

int downloadFile(){

    while(scanning){} scanning = true;

    DataRow r;

    for (int ii = 0; ii < rowCount; ii++) {

        r = photos.Rows[ii];

        if ((string)r["status"] == "ready"){

            r["status"] = "running";

            scanning = false; return ii;                

        }

        if ((string)r["status"] == "running"){

            scanning = false; return -2;                

        }

    }

    scanning = false; return -1;        

}

void downloadFileComplete(IAsyncResult ar){

    if (ar == null){ return; }

    downloadFileDelegate d = (downloadFileDelegate)ar.AsyncState;

    int i = d.EndInvoke(ar);

    if (i == -1){ complete = true; return; }      


    //download file...

    //update row
    DataRow r = photos.Rows[i];

    r["status"] = "complete";

    //invoke delegate again
    d.BeginInvoke(downloadFileComplete, d);

}

However when I run this it takes the same amount of time to run 5 as it does 1. I was expecting it to take 5 times faster.

Any ideas?

A: 

It will take the same amount of time if you are limited by network bandwidth. If you are downloading all 10 files from the same site then it won't be any quicker. Multi threading is useful when you either want the User Interface to respond or you have something processor intensive and multiple cores

Matthew Steeples
In certain cases, the latency ends up limiting throughput on a single connection, and multiple downloads can end up being somewhat faster.
Yuliy
I Agree with Yuliy- I'd rephrase "won't be quicker" to "might not be quicker"
RichardOD
A: 
WaitHandle[] handles = new  WaitHandle[5];
handles[0] = d.BeginInvoke(downloadFileComplete, d).AsyncWaitHandle;
handles[1] = d.BeginInvoke(downloadFileComplete, d).AsyncWaitHandle;
handles[2] = d.BeginInvoke(downloadFileComplete, d).AsyncWaitHandle;
handles[3] = d.BeginInvoke(downloadFileComplete, d).AsyncWaitHandle;
handles[4] = d.BeginInvoke(downloadFileComplete, d).AsyncWaitHandle;
WaitHandle.WaitAll(handles);
Andrey
This is a better technique, but will do nothing to increase the performance.
RichardOD
This doesn't address the issues that are actually causing the code to perform incorrectly, not the improper concurrent accesses to shared data.
LBushkin
+4  A: 

You look like you're trying to use lockless syncronization (using while(scanning) to check a boolean that's set at the start of the function and reset at the end), but all this succeeds in doing is only running one retrieval at a time.

  1. Is there a reason that you don't want them to run concurrently? This seems like the entire point of your exercise. I can't see a reason for this, so I'd lose the scanning flag (and associated logic) entirely.
  2. If you're going to take this approach, your boolean flags need to be declared as volatile (otherwise their reads could be cached and you could wait endlessly)
  3. Your data update operations (updating the value in the DataRow) must take place on the UI thread. You'll have to wrap these operations up in a Control.Invoke or Control.BeginInvoke call, otherwise you'll be interacting with the control across thread boundaries.
  4. BeginInvoke returns an AsyncWaitHandle. Use this for your logic that will take place when the operations are all complete. Something like this

-

WaitHandle[] handles = new WaitHandle[]
{
    d.BeginInvoke(...),
    d.BeginInvoke(...),
    d.BeginInvoke(...),
    d.BeginInvoke(...),
    d.BeginInvoke(...)
}

 WaitHandle.WaitAll(handles);

This will cause the calling thread to block until all of the operations are complete.

Adam Robinson
yeah, it seems the code is trying to serialize the download operations.
Kai Wang
I'd point out you are referring to while(scanning) as opposed to while(!complete), as initially I confused about which while you were referring to.
RichardOD
In order for the code to be written to run concurrently, the access to the DataTable needs to be protected. Alternatively, the asynchronous logic needs to be factored out from the update of the shared data structure.
LBushkin
@LBushkin: The `DataTable` code needs to run on the UI thread anyway. Doing this eliminates the need for any additional synchronization.
Adam Robinson
@Richard: Thanks, I've edited for clarity.
Adam Robinson
A: 

There are numerous things in this implementation that are not safe for concurrent use.

But the one that is likely causing the effect you describe is the fact that downloadFile() has a while() loop that examines the scanning variable. This variable is shared by all instances of the running delegate. This while loop prevents the delegates from running concurrently.

This "scanning loop" is not a proper threading construct - it is possible for two threads to both read the variable and both set it at the same time. You should really use a Semaphore or lock() statement to to protected the DataTable from concurrent access. Since each thread is spending most of it's time waiting for scanning to be false, the downloadFile method cannot run concurrently.

You should reconsider how you've structured this code so that downloading the data and updating the structures in your application are not in contention.

LBushkin
It would seem that the "scanning loop" is entirely unnecessary in this case, unless something has been omitted.
Adam Robinson
@Adam Robinson: The intent of the scanning loop appears to be to protect the DataTable from concurrent access. But I could be wrong - the structure of the code makes it difficult to infer intent.
LBushkin
A: 

Something like this would work better I think.

public class PhotoDownload
{
    public ManualResetEvent Complete { get; private set; }
    public Object RequireData { get; private set; }
    public Object Result { get; private set; }
}

public void DownloadPhotos()
{
    var photos = new List<PhotoDownload>();

    // build photo download list

    foreach (var photo in photos)
    {
        ThreadPool.QueueUserWorkItem(DownloadPhoto, photo);
    }

    // wait for the downloads to complete

    foreach (var photo in photos)
    {
        photo.Complete.WaitOne();
    }

    // make sure everything happened correctly
}

public void DownloadPhoto(object state)
{
    var photo = state as PhotoDownload;
    try
    {
            // do not access fields in this class
            // everything should be inside the photo object
    }
    finally
    {
        photo.Complete.Set();
    }
}
ChaosPandion