views:

110

answers:

4

I have a suspicion that there might be a race condition in a certain C++ multithreading situation involving virtual method calls in a vtable dynamic dispatching implementation (for which a vtable pointer is stored as a hidden member in the object with virtual methods). I would like to confirm whether or not this is actually an issue, and I am specifying boost's threading library so we can assume some frame of reference.

Suppose an object "O" has a boost::mutex member for which the entirety of its constructor/destructor and methods are scope-locked on (similar to the Monitor concurrency pattern). A thread "A" constructs an object "O" on the heap without external synchronization (ie WITHOUT an shared mutex enclosing the "new" operation, for which it could synchronize with other threads; note, though, that there is still the "internal, Monitor" mutex locking the scope of its constructor). The thread A then passes a pointer to the "O" instance (which it just constructed) to another thread "B", by means of a synchronized mechanism--for instance, a synchronized readers-writers queue (note: only the pointer to the object is being passed--not the object itself). After construction, neither thread "A" or any other threads perform any writing operation on the "O" instance which "A" constructed.

The thread "B" reads the pointer value of the object "O" from the synchronized queue, after which it immediately leaves the critical section guarding the queue. Then the thread "B" performs a virtual method call on the object "O." Here is where I think an issue may arise.

Now, my understanding of virtual method calls in a [quite probable] vtable implementation of dynamic dispatching is that the calling thread "B" must dereference the pointer to "O" in order to obtain the vtable pointer stored as a hidden member of its object, and that this happens BEFORE the method body is entered (naturally because the method body to execute is not safely and accurately determined until vtable pointer stored in the object itself is accessed). Assuming the aforementioned statements are possibly true for such an implementation, is this not a race condition?

Since the vtable pointer is retrieved by thread "B" (by dereferencing the pointer to the object "O" located in the heap) prior to any memory visibility guaranteeing operations taking place (ie acquiring the member variable mutex in the object "O"), then it is not certain that "B" will perceive the vtable pointer value that "A" originally wrote on the object "O"'s construction, correct? (ie, it may instead perceive a garbage value, resulting in undefined behavior, correct?).

If the above is a valid possibility, does this not imply that making virtual method calls on exclusively internally synchronized objects that are shared between threads is undefined behavior?

And--likewise--since the standard is agnostic to a vtable implementation, how could one ever guarantee that the vtable pointer is safely visible to other threads prior to a virtual call? I suppose one could externally synchronize ("externally" as in, for instance, "surrounding with a shared mutex lock()/unlock() block") the constructor call and then at least the initial virtual method call in each of the threads, but this seems like some awfully discordant programming.

So, if my suspicions are true, then a possibly more elegant solution would be to use inlined, non-virtual member functions which lock the member mutex and then subsequently forward to a virtual call. But--even then--could we guarantee that the constructor initialized the vtable pointer within the confines of the lock() and unlock() guarding the constructor body itself?

If someone could help me clear this up and confirm/deny my suspicions, I would be very grateful.

EDIT: code demonstrating the above

class Interface
{
    public:
    virtual ~Interface() {}
    virtual void dynamicCall() = 0;
};

class Monitor : public Interface
{
    boost::mutex mutex;
    public:
    Monitor()
    {
        boost::unique_lock<boost::mutex> lock(mutex);
        // initialize
    }
    virtual ~Monitor()
    {
        boost::unique_lock<boost::mutex> lock(mutex);
        // destroy
    }
    virtual void dynamicCall()
    {
        boost::unique_lock<boost::mutex> lock(mutex);
        // do w/e
    }
};

// for simplicity, the numbers following each statement specify the order of execution, and these two functions are assumed
// void passMonitorToSharedQueue( Interface * monitor )
//        Thread A passes the 'monitor' pointer value to a 
//        synchronized queue, pushes it on the queue, and then 
//        notifies Thread B that a new entry exists
// Interface * getMonitorFromSharedQueue()
//        Thread B blocks until Thread A notifies Thread B
//        that a new 'Interface *' can be retrieved,at which
//        point it retrieves and returns it
void threadBFunc()
{
    Interface * if = getMonitorFromSharedQueue(); // (1)
    if->dynamicCall(); // (4) (ISSUE HERE?)
}
void threadAFunc()
{
    Interface * monitor = new Monitor; // (2)
    passMonitorToSharedQueue(monitor); // (3)
}

-- at point (4) I'm under the impression that the vtable pointer value which "Thread A" wrote to memory may not be visible by "Thread B", as I don't see any reason to assume that the compiler will generate code such that the vtable pointer is written within the constructor's locked mutex block.

For instance, consider the situation of multicore systems where each core has a dedicated cache. According to this article, caches are commonly aggressively optimized and--despite forcing cache coherence--do not enforce a strict ordering on cache coherence if there are no synchronization primitives involved.

Perhaps I am misunderstanding the implications of the article, but wouldn't that mean that "A"'s write of the vtable pointer to the constructed object (and there is no indication that this write occurs within a the constructor's locked mutex block) may not be perceived by "B" before "B" reads the vtable pointer? If both A and B are executed on different cores ("A" on core0 and "B" on core1), the cache coherence mechanism may re-order the update of the vtable pointer value in core1's cache (the update that would make it consistent with the value of the vtable pointer in core0's cache, which "A" wrote) such that it occurs after "B"'s read ... if I'm interpreting the article correctly.

A: 

I can't quite understand, but there are two possibilities I think you might be meaning:

A) "O" is fully constructed (constructor returned) before passing it into synchronized queue to "B". In this case there's no problem because the object is fully constructed, including the vtable pointer. The memory at that location will have the vtable because it's all inside one process.

B) "O" isn't fully constructed yet, but for example you're passing this from the constructor into the synchronized queue. In this case the vtable pointer still has to be set up before the body of the constructor is called in thread "A", because it is valid to call virtual functions from a constructor - it will just call the current Class's version of the method, not the most-derived one. Thus I would not expect to see a race condition in this case either. If you are in fact passing this to another thread from within its constructor, you might want to reconsider your approach as it does seem dangerous to possibly make calls on objects that aren't fully constructed.

Mark B
Thanks for your answer. Your interpretation for (A) is correct (the constructor returns), but I'm under the impression that writes by "thread A" are not guaranteed visible by another "thread B" unless there is some form of synchronization. Indeed, the correct pointer value is passed to "B" due to the synchronized queue, but what of what that pointer points to? Is the memory the pointer refers to in "B" guaranteed completely consistent with "A"--even without synchronization? I've been taught otherwise (not that I'm denying you)--that the caches between separate cores might be inconsistent.
A: 

If I try to understand your essay, I believe you are asking this:-

A thread "A" constructs an object "O" on the heap without external synchronization

// global namespace
SomeClass* pClass = new SomeClass;

At the same time you are saying that thread-'A' passes the above instance to thread-'B'. This means that the instance SomeClass is fully constructed Or are you trying to pass this pointer from the ctor of SomeClass to thread-'B'? If yes, you are definitely in trouble w.r.t virtual functions. But this has got nothing to do with the race conditions.

If you are accessing the global instance variable in thread-'B' without thread-'A' passing it then there is a possibility of race conditions. The 'new' instruction is laid out by most compilers like ....

pClass = // Step 3
operator new(sizeof(SomeClass)); // Step 1
new (pClass ) SomeClass; // Step 2

If only Step-1 is complete, or if only Step-1 and Step-2 are complete then accessing pClass is undefined.

HTH

Abhay
A: 

In a shared memory multiprocessor system with implicit caching, you need a memory barrier to make changes to main memory visible to other caches. Generally, you can assume that acquiring or releasing any OS synchronization primitive (and any built on top of them) have a full memory barrier such that all writes that occur before acquiring (or releasing) a synchronization primitive are visible to all processors after you acquire it (or release).

For your specific problem, you have a memory barrier inside Monitor::Monitor(), so by the time it returns the vtable will have been initialized to at least Monitor::vtable. There could be an issue if you derived from Monitor, but in the code you posted you don't so it's not an issue.

If you really wanted to ensure that you got the right vtable when calling getMonitorFromSharedQueue() you should have a read barrier before calling if->dynamicCall().

MSN
Thank you. This is exactly the sort of confidence I was looking for.Since all objects are associated with monitors in java, I'm wondering (granted it uses a vtable implementation) if this is ever an issue (ie perhaps the dynamic call is made after the acquire).
A: 

In the absence of synchronization, you are right that there may be a race condition on the vtable, since the writes to memory by the constructor in thread A may not be visible to thread B.

However, queues used for inter-thread communication usually contain synchronization to address precisely this issue. I would therefore expect the queue referenced by getMonitorFromSharedQueue and passMonitorToSharedQueue to handle this. If they do not, then you might think about using an alternative queue implementation, such as the one I wrote about on my blog at:

http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html

Anthony Williams