views:

82

answers:

3

I have a program that downloads a binary file, from another PC.
I also have a another standalone program that can convert this binary file to a human readable CSV.

I would like to bring the conversion tool "into" the download tool, creating a thread in the download tool that kicks off the conversion code (so it can start converting while it is downloading, reducing the total time of download and convert independently).

I believe I can successfully kick off another thread but how do I synchronize the conversion thread with the main download?

i.e. The conversion catches up with the download, needs to wait for more to download, then start converting again, etc.

Is this similar to the Synchronizing Execution of Multiple Threads ? If so does this mean the downloaded binary needs to be a resource accessed by semaphores?

Am I on the right path or should i be pointed in another direction before I start?

Any advice is appreciated.

Thank You.

+2  A: 

Intead of downloading to a file, you should write the downloaded data to a pipe. The convert thread can be reading from the pipe and then writing the converted output to a file. That will automatically synchronize them.

If you need the original file as well as the converted one, just have the download thread write the data to the file then write the same data to the pipe.

Amardeep
Awesome, does the pipe have draw backs vs. the thread safe queue mentioned above?
Tommy
A queue would be easier to use. Suppose you will want to download files in parallel in the future. It will be more difficult to scale with pipes.
the_void
@the_void: That's certainly true for Windows! ;-)
Amardeep
+2  A: 

Yes, you undoubtedly need semaphores (or something similar such as an event or critical section) to protect access to the data.

My immediate reaction would be to think primarily in terms of a sequence of blocks though, not an entire file. Second, I almost never use a semaphore (or anything similar) directly. Instead, I would normally use a thread-safe queue, so when the network thread has read a block, it puts a structure into the queue saying where the data is and such. The processing thread waits for an item in the queue, and when one arrives it pops and processes that block.

When it finishes processing a block, it'll typically push the result onto another queue for the next stage of processing (e.g., writing to a file), and (quite possibly) put a descriptor for the processed block onto another queue, so the memory can be re-used for reading another block of input.

At least in my experience, this type of design eliminates a large percentage of thread synchronization issues.

Edit: I'm not sure about guidelines about how to design a thread-safe queue, but I've posted code for a simple one in a previous answer.

As far as design patterns go, I've seen this called at least "pipeline" and "production line" (though I'm not sure I've seen the latter in much literature).

Jerry Coffin
Very nice. What is a good resource for learning how to implement a thread safe queue; msdn, web article, specific books, etc?
Tommy
the_void
+2  A: 

This is a classic case of the producer-consumer problem with the download thread as the producer and the conversion thread as the consumer.

Google around and you'll find an implementation for your language of choice. Here are some from MSDN: How to: Implement Various Producer-Consumer Patterns.

the_void
Great, I did not know this was defined so well already. This helps for searching implementations and examples in C++
Tommy
`Producer-Consumer` it's one of the classic synchronization problems. Check the wiki for many others. The good part is that there are solutions that can be applied to most synchronization needs.
the_void