views:

101

answers:

2

I know that there are similar questions which are already answered, but I am asking this question since they don’t exactly give what I would like to know. This is about synchronization between threads. The idea of my project is that we obtain data from a data acquisition card and plot and analyze data during data acquisition. So far, I only have a class for data acquisition on one thread and a class for plotting on another thread. The data acquisition class stores data in a global circular buffer and plot class copy the data from the global buffer and do some processes for plotting (reduction of data point etc.). This is what I think is called a (single) producer- (single) consumer problem. I have managed to do this part using two semaphores which keep track of how many data points are stored by the acquisition class and how many are used by the plotting class.

Now, I would like to introduce another class on another thread which analyzes data. Then, I would have one producer and two consumers. I would like to impose the following conditions:

  1. The two readers share the same data set. I.e., each produced item has to be used by both readers, instead of by only one of them.
  2. When the buffer gets full, the data acquisition class overwrites the global buffer. When the reader(s) looses data due to overwriting of the buffer by the data acquisition class, this has to be detected and, ideally, kept in a log (for example, what part of the data is missed by the readear(s)).
  3. The calculation of the analysis class could be intensive. For this, I may need a bigger data buffer in the analysis class.

The way I dealt with the first part (single producer and single consumer) doesn’t seem to extend to the case of the second part (single producer and two consumers) in a straightforward way. I am wondering how I should proceed. I use C++ with Qt for threading since I use Qt for GUI. But, the solution doesn’t necessarily have to be with Qt. However, If possible, sample codes or pseudo codes would be greatly appreciated. I have found a similar thread to my problem here ). It is suggested to use boost::interprocess. However, since I have never used Boost library before and, although I have read documents about boost::interprocess, it looks too involved to figure out myself.

Thanks a lot!

Daisuke

A: 

I think you should read the following concurrency articles by Herb Sutter to get a feel of how you should structure things and how to be scalable (if that's a goal). The link below is the latest article but also contains the full list of previous ones.

http://herbsutter.com/2010/09/24/effective-concurrency-know-when-to-use-an-active-object-instead-of-a-mutex/

In a nutshell, where possible you should make copies of the data for each thread to reduce contention on the actual resource instead of wrapping everything in mutexes. This arcticle is about this.

http://herbsutter.com/2008/05/23/effective-concurrency-maximize-locality-minimize-contention/

David
Thanks for the suggestion. Making copies of the data is a good idea. I was thinking along the line of wrapping everything in mutexes. Thanks also for the article link. The articles look a bit dense to me. But, I will spend some time on reading them. Hopefully, I can increase the maturity with concurrent programming.
Daisuke
+1  A: 

If both consumers need to see all data items, you are probably better off with a buffer per consumer. The producer can then post the same data into each buffer. If you are concerned about the memory requirements of doubling the data this way, and the data is not modified by the consumers, then you could use a reference counted pointer such as boost::shared_ptr, and post a pointer to the data into each buffer. That way the data item is shared, but the readers can process the data independently.

It should be pretty trivial to have the producer log that it has overwritten some data.

Anthony Williams
Thanks for the reply. So far, memory is not a big concern. So, I will start with a buffer per consumer. For a buffer per consumer, what would you suggest to use? One thing I can think of is a queue in STL and queue::push() the data from the producer and queue::pop the data after used by a consumer. Would this be a good idea? One thing would be that my producer create a chunk of data at one time (about 10,000 elements of double or integer). If I queue::push() each element, would it be a bit slow? Thanks!
Daisuke
If your producer creates a chunk of elements at once, push them as a chunk. Since this is a single-consumer queue, the consumer can make sure that it processes the elements in the chunk in the correct order.
Anthony Williams
Would queue in STL allow me to push elements as a chunk? or Should I inherit the class and create a member function to do so? If the latter is the case, how would you do this?
Daisuke
You could have a `queue<data_chunk>`, and have your `data_chunk` contain a variable number of elements.
Anthony Williams
Sounds good. Thanks a lot!
Daisuke