views:

153

answers:

4

In Java each object has a synchronisation monitor. So i guess the implementation is pretty condensed in term of memory usage and hopefully fast as well.

When porting this to C++ what whould be the best implementation for it. I think that there must be something better then "pthread_mutex_init" or is the object overhead in java really so high?

Edit: i just checked that pthread_mutex_t on Linux i386 is 24 bytes large. Thats huge if i have to reserve this space for each object.

+3  A: 

In a sense it's worse than pthread_mutex_init, actually. Because of Java's wait/notify you kind of need a paired mutex and condition variable to implement a monitor.

In practice, when implementing a JVM you hunt down and apply every single platform-specific optimisation in the book, and then invent some new ones, to make monitors as fast as possible. If you can't do a really fiendish job of that, you definitely aren't up to optimising garbage collection ;-)

One observation is that not every object needs to have its own monitor. An object which isn't currently synchronised doesn't need one. So the JVM can create a pool of monitors, and each object could just have a pointer field, which is filled in when a thread actually wants to synchronise on the object (with a platform-specific atomic compare and swap operation, for instance). So the cost of monitor initialisation doesn't have to add to the cost of object creation. Assuming the memory is pre-cleared, object creation can be: decrement a pointer (plus some kind of bounds check, with a predicted-false branch to the code that runs gc and so on); fill in the type; call the most derived constructor. I think you can arrange for the constructor of Object to do nothing, but obviously a lot depends on the implementation.

In practice, the average Java application isn't synchronising on very many objects at any one time, so monitor pools are potentially a huge optimisation in time and memory.

Steve Jessop
Unless you have a lot of bad programmers code which is adding synchronize to every single method. But i like your idea. Trading a hashtable lookup to find the associated object vs. memory overhead.
Lothar
In my (fairly old) experience of implementing a JVM, bad coders also guarantee that most of the objects created by their programs are Strings. So even if they synchronize all their own methods, and somehow avoid deadlocks, it's still only a minority of all objects that need a monitor ;-). I don't think our JVM used thin locking as in brianegge's link: an object either had a monitor, or it didn't. We were faster than Sun at the time (for PJava/J2ME), but that was before the clever stuff from HotSpot reached mobile, and the KVM became terrifying.
Steve Jessop
+2  A: 

The Sun Hotspot JVM implements thin locks using compare and swap. If an object is locked, then the waiting thread wait on the monitor of thread which locked the object. This means you only need one heavy lock per thread.

brianegge
That's what we like. All pointers should have the bottom three bits re-used for flags ;-)
Steve Jessop
This means that class would have to maintain a list of threads to awaken then hit each one when notify was called. This implementation could be as fast as any system but not with pthreads.
caspin
+2  A: 

I'm not sure how Java does it, but .NET doesn't keep the mutex (or analog - the structure that holds it is called "syncblk" there) directly in the object. Rather, it has a global table of syncblks, and object references its syncblk by index in that table. Furthermore, objects don't get a syncblk as soon as they're created - instead, it's created on demand on the first lock.

I assume (note, I do not know how it actually does that!) that it uses atomic compare-and-exchange to associate the object and its syncblk in a thread-safe way:

  1. Check the hidden syncblk_index field of our object for 0. If it's not 0, lock it and proceed, otherwise...
  2. Create a new syncblk in global table, get the index for it (global locks are acquired/released here as needed).
  3. Compare-and-exchange to write it into object itself.
  4. If previous value was 0 (assume that 0 is not a valid index, and is the initial value for the hidden syncblk_index field of our objects), our syncblk creation was not contested. Lock on it and proceed.
  5. If previous value was not 0, then someone else had already created a syncblk and associated it with the object while we were creating ours, and we have the index of that syncblk now. Dispose the one we've just created, and lock on the one that we've obtained.

Thus the overhead per-object is 4 bytes (assuming 32-bit indices into syncblk table) in best case, but larger for objects which actually have been locked. If you only rarely lock on your objects, then this scheme looks like a good way to cut down on resource usage. But if you need to lock on most or all your objects eventually, storing a mutex directly within the object might be faster.

Pavel Minaev
"if you need to lock on most or all your objects eventually" ... then you don't deserve for your program to run quickly ;-)
Steve Jessop
I should have probably be more specific: of course I meant objects of a specific class, not all objects in general. It's quite often that for objects of a particular class, it's certain that at some point of its life any instance of that class will have to be locked. If so, allocating the mutex directly within the object may make more sense.
Pavel Minaev
+1  A: 

Surely you don't need such a monitor for every object!

When porting from Java to C++, it strikes me as a bad idea to just copy everything blindly. The best structure for Java is not the same as the best for C++, not least because Java has garbage collection and C++ doesn't.

Add a monitor to only those objects that really need it. If only some instances of a type need synchronization then it's not that hard to create a wrapper class that contains the mutex (and possibly condition variable) necessary for synchronization. As others have already said, an alternative is to use a pool of synchronization objects with some means of choosing one for each object, such as using a hash of the object address to index the array.

I'd use the boost thread library or the new C++0x standard thread library for portability rather than relying on platform specifics at each turn. Boost.Thread supports Linux, MacOSX, win32, Solaris, HP-UX and others. My implementation of the C++0x thread library currently only supports Windows and Linux, but other implementations will become available in due course.

Anthony Williams