views:

310

answers:

3

Currently I have multi-threaded downloader class that uses HttpWebRequest/Response. All works fine, it's super fast, BUT.. the problem is that the data needs to be streamed while it's downloading to another app. That means that it must be streamed in the right order, the first chunk first, and then the next in the queue. Currently my downloader class is sync and Download() returns byte[]. In my async multi-threaded class I make for example, list with 4 empty elements (for slots) and I pass each index of the slot to each thread using the Download() function. That simulates synchronization, but that's not what I need. How should I do the queue thing, to make sure the data is streamed as soon as the first chunk start downloading.

A: 

To create a synchronized multi-threaded downloader, you will need to create correct data structure, and you'll need more than just byte[] of data.

Steps:

  1. Break your download into multiple chunks based on the size of the content or fixed-sized-content downloader about 500KB downloaded by each thread.
  2. When starting the thread, specify the chunk-index - 1st part, 2nd part etc
  3. When download is available, align the final content based on the chunk index.

If interested, you may want to have a look at the code of prozilla (C, Linux based - at ) or Axel.

MasterGaurav
I've already done that. I've got 5 chunks (threads), 1mb each. But I output the data only if the whole 5 threads are finished, because I don't know how should I check for tight order.
blez
No need to wait until all 5 threads are finished.You can have yet another strategy - create a file of 5MB size and as and when the thread-responses are available, flush them onto the file! :)
MasterGaurav
Yes, but I need to stream the data, not especially use file. That why I cant' use this method.
blez
Ah!How about using a memory-mapped file along with the "highest-contiguous-block-index".For example, after 1,2, 5 threads are over, mark the index value to 2. Once 3 is downloaded, make it 3 after 4 is downloaded, make it 5.
MasterGaurav
A: 

Can you show the code where you do the downloads, and the code where you kick off the multiple async threads?

Maybe I am not understanding your scenario fully, but if I were you, I would use Async (BeginRead on the responseStream). Then I would do the following....

void StartReading(Stream responseStream)
{
    byte [] buffer = new byte[1024];
    Context ctx = new Context();
    ctx.Buffer = buffer;
    ctx.InputStream = responseStream;
    ctx.OutputStream = new MemoryStream(); // change this to be your output stream

    responseStream.BeginRead(buffer, 0, buffer.Length; new AsyncCallback(ReadCallback), ctx);
}

void ReadCallback(IAsyncResult ar)
{
    Context ctx = (Context)ar.AsyncState;
    int read = 0;
    try {
        read = ctx.InputStream.EndRead(ar);
        if (read > 0)
        {
            ctx.OutputStream.Write(ctx.Buffer, 0, read);
            // kick off another async read
            ctx.InputStream.BeginRead(ctx.Buffer, 0, ctx.Buffer.Length, new AsyncCallback(ReadCallback), ctx);
        } else {
            ctx.InputStream.Close();
            ctx.OutputStream.Close();
        }
     } catch {
     }
}

}
feroze
+2  A: 

If your question is about how to determine which thread is downloading the first chunk and when that first chunk is ready for use, use an event per thread and keep track of which chunks you've assigned to which threads. Keep track of which event you pass to the first thread (that will be downloading the first chunk of data), the event you pass to the second thread (for the 2nd chunk of data) etc. Have the main thread, or another background thread (to avoid blocking the UI thread), wait on the first event. When the first thread finishes downloading its chunk, it sets/signals the first event. The thread that is waiting will then wake up and can use the first chunk of data.

The other download threads can do the same, signalling their respective events when they are done. Use a manual reset event so that the event will remain signaled even if nobody is waiting on it. When the thread that needs the data blocks in order finishes processing the first data block, it can wait on the 2nd event. If the 2nd event has already been signalled, then the wait will return immediately and the thread can begin processing the 2nd data block.

For a very large download you can reuse the events and threads in a round-robin fashion. The order that they finish isn't important as long as the thread that consumes the data chunks consumes them in order and waits on the respective events in order.

If you're clever and careful, you can probably do all of this using only one event: create a global array of data chunk pointers / objects initially set to null, worker threads download chunks of data and assign the finished chunks to their respective slot in the global array and then signal the shared event. The consumer thread keeps a data chunk counter so it knows which data chunk it needs to handle next, waits on the shared event, and when it is signalled looks at the next slot in the global array to see if data has appeared there. If there is still no data in the next slot in sequence, the consumer thread goes back to waiting on the event. You'll also need a way for the worker threads to know which data block they should download next - a global counter protected by a mutex or accessed using interlockedadd/exchange would suffice. Each worker thread increments the global counter and downloads that data chunk number, and assigns the result to the nth slot in the global list of data chunks.

dthorpe
Well, that's my code: http://pastie.org/949929 DownloadSingleChunk(index, startPos, endPos) and while it gets the specified range it writes it into the chunkOutput List<byte[]>. I thought the way using counter, but that means I should be looping my main thread, or another working thread. Is that right?
blez
If your download is larger than a few megabytes, yes, you should have each worker thread looping to pick up more blocks to download. You want to keep your number of threads fairly small and keep the size of each download chunk from getting too large. Downloading very large data chunks runs a greater risk of hitting packet loss or transmission error which may force you to start the whole chunk over again. Small blocks that can transmit in a minute or two each are more resilient to network errors, IMO.
dthorpe
And yes, you should probably have another thread to loop and wait for the next data chunk in the array to become available and then send it on to whereever you're sending it, described as the "consumer thread" at the end of my post. You don't want your main UI thread to block waiting for an event, because that will cause your UI to freeze.
dthorpe