views:

218

answers:

6

Some C++ library I'm working on features a simple tracing mechanism which can be activated to generate log files showing which functions were called and what arguments were passed. It basically boils down to a TRACE macro being spilled all over the source of the library, and the macro expands to something like this:

typedef void(*TraceProc)( const char *msg );

/* Sets 'callback' to point to the trace procedure which actually prints the given
 * message to some output channel, or to a null trace procedure which is a no-op when
 * case the given source file/line position was disabled by the client.
 *
 * This function also registers the callback pointer in an internal data structure
 * and resets it to zero in case the filtering configuration changed since the last
 * invocation of updateTraceCallback.
 */
void updateTraceCallback( TraceProc *callback, const char *file, unsinged int lineno );

#define TRACE(msg)                                                  \
{                                                                   \
    static TraceProc traceCallback = 0;                             \
    if ( !traceCallback )                                           \
        updateTraceCallback( &traceCallback, __FILE__, __LINE__ );  \
    traceCallback( msg );                                           \
}

The idea is that people can just say TRACE("foo hit") in their code and that will either call a debug printing function or it will be a no-op. They can use some other API (which is not shown here) to configure that only TRACE uses in locations (source file/line number) should be printed. This configuration can change at runtime.

The issue with this is that this idea should now be used in a multi-threaded code base. Hence, the code which TRACE expands to needs to work correctly in the face of multiple threads of execution running the code simultaneously. There are about 20.000 different trace points in the code base right now and they are hit very often, so they should be rather efficient

What is the most efficient way to make this approach thread safe? I need a solution for Windows (XP and newer) and Linux. I'm afraid of doing excessive locking just to check whether the filter configuration changed (99% of the time a trace point is hit, the configuration didn't change). I'm open to larger changes to the macro, too. So instead of discussing mutex vs. critical section performance, it would also be acceptable if the macro just sent an event to an event loop in a different thread (assuming that accessing the event loop is thread safe) and all the processing happens in the same thread, so it's synchronized using the event loop.

UPDATE: I can probably simplify this question to:

If I have one thread reading a pointer, and another thread which might write to the variable (but 99% of the time it doesn't), how can I avoid that the reading thread needs to lock all the time?

A: 
#define TRACE(msg)                                                  \
{                                                                   \
    static TraceProc traceCallback =                                \
        updateTraceBallback( &traceCallback, __FILE__, __LINE__ );  \
    traceCallback( msg );                                           \
}
Marcelo Cantos
This is unfortunately not an option since the `updateTraceCallback` function will be called only once, either initializing `traceCallback` to a no-op function or to a real logging function. However, the filtering can change at runtime, so either `updateTraceCallback` needs to be called repeatedly or there needs to be some locking because `traceCallback` can be set to point to the no-op function by another thread (because the configuration changed).
Frerich Raabe
Furthermore, scoped static initialization is apparently not thread safe according to http://blogs.msdn.com/b/oldnewthing/archive/2004/03/08/85901.aspx
Frerich Raabe
@Frerich: The semantics in your original version are, for all intents and purposes, identical to normal static initialisation, so whatever is wrong with my version is also wrong with yours. If the filtering changes at runtime, neither version will work correctly, with or without threads. On the subject of threads: every compiler I have worked with in the last 15 years emits thread-safe static initialisers. Tell your compiler to spit out some assembler and see for yourself.
Marcelo Cantos
+1  A: 

Your comment says "resets it to zero in case the filtering configuration changes at runtime" but am I correct in reading that as "resets it to zero when the filtering configuration changes"?

Without knowing exactly how updateTraceCallback implements its data structure, or what other data it's referring to in order to decide when to reset the callbacks (or indeed to set them in the first place), it's impossible to judge what would be safe. A similar problem applies to knowing what traceCallback does - if it accesses a shared output destination, for example.

Given these limitations the only safe recommendation that doesn't require reworking other code is to stick a mutex around the whole lot (or preferably a critical section on Windows).

Kylotan
I'm sorry for the imprecision of the comment. I now reworded it a bit so that it's hopefully clearer. The `updateTraceCallback` maintains a simple `std::set<TraceProc*>` which contains all callbacks which were seen so far (note that their pointer values are unique since they are all static). In case a pointer is not in the set yet, it is added. In case the filter configuration changes, all callbacks in the set are set to zero so that they are reinitialized the next time the `TRACE` macro executes.
Frerich Raabe
std::set is not threadsafe, so you simply cannot continue safely with that design without using a lock around all std::set accesses.
Kylotan
+2  A: 

I still don't fully understand the question, so please correct me on anything I didn't get.

(I'm leaving out the backslashes.)

#define TRACE(msg)
{
    static TraceProc traceCallback = NULL;
    TraceProc localTraceCallback;

    localTraceCallback = traceCallback;

    if (!localTraceCallback)
    {
        updateTraceBallback(&localTraceCallback, __FILE__, __LINE__);
        // If two threads are running this at the same time 
        // one of them will update traceCallback and get it overwritten 
        // by the other. This isn't a big deal.
        traceCallback = localTraceCallback;
    }

    // Now there's no way localTraceCallback can be null.
    // An issue here is if in the middle of this executing 
    // traceCallback gets null'ed. But you haven't specified any 
    // restrictions about this either, so I'm assuming it isn't a problem.
    localTraceCallback(msg);
}
wj32
The issue you mention in the last comment doesn't arise as long as care is taken that the callback doesn't use any structures that are no longer valid after the filter resets.
JeremyP
+1  A: 

I'm afraid of doing excessive locking just to check whether the filter configuration changed (99% of the time a trace point is hit, the configuration didn't change). I'm open to larger changes to the macro, too. So instead of discussing mutex vs. critical section performance, it would also be acceptable if the macro just sent an event to an event loop in a different thread (assuming that accessing the event loop is thread safe)

How do you think thread safe messaging between threads is implemented without locks?

Anyway, here's a design that might work:

The data structure that holds the filter must be changed so that it is allocated dynamically from the heap because we are going to be creating multiple instances of filters. Also, it's going to need a reference count added to it. You need a typedef something like:

typedef struct Filter
{
    unsigned int refCount;
    // all the other filter data
} Filter;

There's a singleton 'current filter' declared somewhere.

static Filter* currentFilter;

and initialised with some default settings.

In your TRACE macro:

#define TRACE(char* msg)
{
    static Filter* filter = NULL;
    static TraceProc traceCallback = NULL;

    if (filterOutOfDate(filter))
    {
        getNewCallback(__FILE__, __LINE__, &traceCallback, &filter);
    }
    traceCallback(msg);
}

filterOutOfDate() merely compares the filter with currentFilter to see if it is the same. It should be enough to just compare addresses. It does no locking.

getNewCallback() applies the current filter to get the new trace function and updates the filter passed in with the address of the current filter. It's implementation must be protected with a mutex lock. Also, it decremetns the refCount of the original filter and increments the refCount of the new filter. This is so we know when we can free the old filter.

void getNewCallback(const char* file, int line, TraceProc* newCallback, Filter** filter)
{
    // MUTEX lock
    newCallback = // whatever you need to do
    currentFilter->refCount++;
    if (*filter != NULL)
    {
        *filter->refCount--;
        if (*filter->refCount == 0)
        {
            // free filter and associated resources
        }
    }
    *filter = currentFilter;
    // MUTEX unlock
}

When you want to change the filter, you do something like

changeFilter()
{ 
    Filter* newFilter = // build a new filter
    newFilter->refCount = 0;
    // MUTEX lock (same mutex as above)
    currentFilter = newFilter;
    // MUTEX unlock
}
JeremyP
+2  A: 

You could implement a configuration file version variable. When your program starts it is set to 0. The macro can hold a static int that is the last config version it saw. Then a simple atomic comparison between the last seen and the current config version will tell you if you need to do a full lock and re-call updateTraceCallback();.

That way, 99% of the time you'll only add an extra atomic op, or memory barrier or something simmilar, which is very cheap. 1% of the time, just do the full mutex thing, it shouldn't affect your performance in any noticeable way, if its only 1% of the time.

Edit:

Some .h file:

extern long trace_version;

Some .cpp file:

long trace_version = 0;

The macro:

#define TRACE(msg)                                                  
{
    static long __lastSeenVersion = -1;
    static TraceProc traceCallback = 0;
    if ( !traceCallback || __lastSeenVersion != trace_version )
        updateTraceCallback( &traceCallback, &__lastSeenVersion, __FILE__, __LINE__ );
    traceCallback( msg );
}

The functions for incrementing a version and updates:

static long oldVersionRefcount = 0;
static long curVersionRefCount = 0;

void updateTraceCallback( TraceProc *callback, long &version, const char *file, unsinged int lineno ) {
   if ( version != trace_version ) {
      if ( InterlockedDecrement( oldVersionRefcount ) == 0 ) {
        //....free resources.....
        //...no mutex needed, since no one is using this,,,
      }
      //....aquire mutex and do stuff....
      InterlockedIncrement( curVersionRefCount );
      *version = trace_version;
      //...release mutex...
   }
 }

 void setNewTraceCallback( TraceProc *callback ) {
    //...aquire mutex...
    trace_version++;  // No locks, mutexes or anything, this is atomic by itself.  
    while ( oldVersionRefcount != 0 ) { //..sleep? }
    InterlockedExchange( &oldVersionRefcount, curVersionRefCount );
    curVersionRefCount = 0;
    //.... and so on...
    //...release mutex...

Of course, this is very simplified, since if you need to upgrade the version and the oldVersionRefCount > 0, then you're in trouble; how to solve this is up to you, since it really depends on your problem. My guess is that in those situations, you could simply wait until the ref count is zero, since the amount of time that the ref count is incremented should be the time it takes to run the macro.

Gianni
Sounds interesting! What's the right way to perform an atomic integer comparison - `InterlockedCompareExchange`? Also, I guess I need to use an atomic function for changing the file version, too?
Frerich Raabe
Not even that. Usually all you need is a memory barrier: https://secure.wikimedia.org/wikipedia/en/wiki/Memory_barrier maybe, not even that. x86 int operations are thread safe by themselves, so, in your case, if I'm reading right, only 1 thread will write, all others will read, so maybe, not locking might be needed for the versions, not even for updating them. Just `x++` and `if (x > y)` might will do, if you keep the old version around while other threads are still using them (i.e. haven't seen yet the new version number).
Gianni
Just to be clear: the reason I mentioned atomics, is if you add a ref counter on the old version, so that the last one to finish using it and notices that the version changed will free it. So, first step in your macro will be an `InterlockedCompareExchange` op to increment a ref counter. The last, another `InterlockedCompareExchange` to decrease, and if the version changed, free/delete the resources.
Gianni
+1  A: 

If I have one thread reading a pointer, and another thread which might write to the variable (but 99% of the time it doesn't), how can I avoid that the reading thread needs to lock all the time?

From your code, it is OK to use the mutex inside the updateTraceCallback() since it is going to be called very rarely (once per location). After taking the mutex, check whether the traceCallback is already initialized: if yes, then other thread just did it for you and there is nothing to be done.

If updateTraceCallback() would turn out to be a serious performance problem due to the collisions on the global mutex, then you can simply make an array of mutexes instead and use hashed value of the traceCallback pointer as an index into the mutex array. That would spread locking over many mutexes and minimize number of collisions.

Dummy00001
+1 The idea of using an array of mutexes is interesting, thanks for bringing this up!
Frerich Raabe
From what I can see the problem is that all these different updateTraceCallback calls access a single shared data structure. Multiple mutexes don't help here and won't protect against that structure getting trashed. Mutex contention is a symptom of the problem rather than the problem itself.
Kylotan