views:

181

answers:

3

I have a fairly specific question about concurrent programming in C. I have done a fair bit of research on this but have seen several conflicting answers, so I'm hoping for some clarification. I have a program that's something like the following (sorry for the longish code block):

typedef struct {
  pthread_mutex_t mutex;
  /* some shared data */
  int eventCounter;
} SharedData;

SharedData globalSharedData;

typedef struct {
  /* details unimportant */
} NewData;

void newData(NewData data) {
  int localCopyOfCounter;

  if (/* information contained in new data triggers an
         event */) {
    pthread_mutex_lock(&globalSharedData.mutex);
    localCopyOfCounter = ++globalSharedData.eventCounter;
    pthread_mutex_unlock(&globalSharedData.mutex);
  }
  else {
    return;
  }

  /* Perform long running computation. */

  if (localCopyOfCounter != globalSharedData.eventCounter) {
    /* A new event has happened, old information is stale and
       the current computation can be aborted. */
    return;
  }

  /* Perform another long running computation whose results
     depend on the previous one. */

  if (localCopyOfCounter != globalSharedData.eventCounter) {
    /* Another check for new event that causes information
       to be stale. */
    return;
  }

  /* Final stage of computation whose results depend on two
     previous stages. */
}

There is a pool of threads servicing the connection for incoming data, so multiple instances of newData can be running at the same time. In a multi-processor environment there are two problems I'm aware of in getting the counter handling part of this code correct: preventing the compiler from caching the shared counter copy in a register so other threads can't see it, and forcing the CPU to write the store of the counter value to memory in a timely fashion so other threads can see it. I would prefer not to use a synchronization call around the counter checks because a partial read of the counter value is acceptable (it will produce a value different than the local copy, which should be adequate to conclude that an event has occurred). Would it be sufficient to declare the eventCounter field in SharedData to be volatile, or do I need to do something else here? Also is there a better way to handle this?

+3  A: 

Unfortunately, the C standard says very little about concurrency. However, most compilers (gcc and msvc, anyway) will regard a volatile read as if having acquire semantics -- the volatile variable will be reloaded from memory on every access. That is desirable, your code as it is now may end up comparing values cached in registers. I wouldn't even be surprised if the both comparisons were optimized out.

So the answer is yes, make the eventCounter volatile. Alternatively, if you don't want to restrict your compiler too much, you can use the following function to perform reads of eventCounter.

int load_acquire(volatile int * counter) { return *counter; }

if (localCopy != load_acquire(&sharedCopy))
    // ...
avakar
A: 

preventing the compiler from caching the local counter copy in a register so other threads can't see it

Your local counter copy is "local", created on the execution stack and visible only to the running thread. Every other thread runs in a different stack and has the own local counter variable (no concurrency).

Your global counter should be declared volatile to avoid register optimization.

Andrea Balducci
You are absolutely correct about the local copy, I just fixed that error in my text.
A: 

You can also use hand coded assembly or compiler intrinsics which will garuntee atomic checks against your mutex, they can also atomically ++ and -- your counter.

volatile is useless these days, for the most part, you should look at memory barrier's which are other low level CPU facility to help with multi-core contention.

However the best advice I can give, would be for you to bone up on the various managed and native multi-core support libraries. I guess some of the older one's like OpenMP or MPI (message based), are still kicking and people will go on about how cool they are... however for most developers, something like intel's TBB or Microsoft's new API's, I also just dug up this code project article, he's apparently using cmpxchg8b which is the lowlevel hardware route which I mentioned initially...

Good luck.

RandomNickName42