views:

112

answers:

3

I'm trying to extend the Clojure language to extend ACI-guaranteed refs to ACID-guaranteed drefs (durable refs). The API is to simply to call (dref key value), where key is a String of the key to be used in the underlying data store (BDB JE in my current implementation), and value is the Object that the dref should be initialized to. If key already exists in the DB, the stored value is used instead.

Multiple drefs can be created with the same key, and they need to be synchronized, i.e. if one dref with key "A" participates in a transaction where it is written or read with an (ensure), all other drefs with key "A" must be transactionally synchronized: read-locks and write-locks must be used to impose ordering on transactions involving those drefs. In a larger sense, although there may be more than one in-memory dref with the same key, all of those drefs with that key are a single logical object.

For obvious reasons, it's much easier to simply ensure that this single logical dref is implemented with a single concrete in-memory dref. That way there's nothing to synchronize. How do I do this?

The obvious answer is to use an object pool keyed on key. Then Clojure will call the static getInstance(key,value) method to retrieve from the pool if it exists, and create it and populate the pool if not. The problem with this approach is that there's no easy way to get Clojure to release the object when it's done. Memory-leak city. I have to ensure that any object with strong references to it will not be collected, and that they exist in the pool. It would be disastrous if the pool loses references to logical drefs that are still in use, since another process could create a new dref with the same key, and it wouldn't be transactionally safe with the other dref with the same key.

So I need some version of the WeakHashMap or something using not-strong references (I would prefer SoftReferences for a little more reluctance by the GC). So:

  1. If I use a HashMap<String,SoftReference<DRef>>, how do I ensure that the map will evict entries if the value of the entry (SoftReference) is collected? Some sort of daemon thread?
  2. How do I make the pool thread-safe for the GC? Or do I not have to worry about that since the GC is operating at the SoftReference level and my daemon thread would be the one operating at the Map level?
  3. On a related note, how do I make sure that the daemon thread is running? Is there any way that it can stop without throwing an exception that will crash the entire JVM if uncaught? If so, how do I monitor and start a new one if needed?
+1  A: 

Have you try google-collections?

They have a MapMaker that gives variations of concurrent hash maps with soft/weak keys and values. One problem is that equality for Weak/Soft key is identity, which is annoying but maybe not too much if the key is a String.

Other libraries does that I believe (org.apache.commons.collections but I never used them).

Nicolas Oury
google-collection is part of http://code.google.com/p/guava-libraries/ now.
Nicolas Oury
The thing I am pooling is the value, which is not a String, so making the value soft doesn't work, because of identity equality. BTW, does the map that MapMaker produces automatically remove the key-value mapping as well when the soft value get collected? I assume I would use hard keys and soft values in my implementation...
entaroadun
I think that the map removes evrything. It is made for this kind of use.
Nicolas Oury
By the way, the identity problem is only for Weak/Soft keys. If you use strong keys, then .equals is used.
Nicolas Oury
Thanks, I'll dig into this. Is there a good summary online of the implementation? My limited Google-fu isn't finding anything.
entaroadun
MapMaker is indeed made for exactly this kind of use. Thanks!
entaroadun
A: 

The easy answer is probably just a Collections.SynchronizedMap(new WeakHashMap()) - though that doesn't give you thread-safe iteration by itself.

1) You could implement Map<K, V> yourself, and delegate to a ConcurrentHashMap<K, SoftReference<V> >. You can place your SoftReferences in a ReferenceQueue, and either use a daemon thread to remove references from your map or just check your ReferenceQueue before/after each operation (or each nth operation, etc).

2) The GC will only null out your references - you don't have to worry about it mucking with your map, so no threading concerns there.

3) You could look at how the AWT-EventQueue is managed. But:

-your daemon thread will probably be simple enough to not throw unexpected exceptions

-if you're concerned about it, you could wrap the meat of your daemon thread in

for (;;) {
  try {
    //daemon thread loop here
  } catch (Exception ex) {
    //log it, any other possible cleanup
  }
}

which will run forever unless you get an Error (in which case you've got bigger issues.)

Sbodd
WeakHashMap will give a memory-leak or a disaster, depending on the type of the key. It is weak in the key, not in the value.
Nicolas Oury
ITA. I was voting for the 2) GC observation.
entaroadun
+1  A: 

Psssst....you're recreating Terracotta's distributed shared objects. The internals of Terracotta look very similar to this, although they rely (in DSO) on using bytecode manipulation at load time to intercept all reads and writes to a field whereas in Clojure it's quite a bit easier.

If you want to look at the Terracotta implementation, the ClientObjectManager (http://svn.terracotta.org/svn/tc/dso/trunk/code/base/dso-l1/src/com/tc/object/) is the main client-side class that manages shared objects. Check out the pojoToManaged and look through some of the related code in TCObjectImpl.

1,2) You might find Bob Lee's talk The Ghost in the Virtual Machine to be helpful - it's the best reference I've found for this kind of stuff. SoftReferences and GC (and finalizers) can be kind of tricky.

3) Google for uncaught exception handlers...

Alex Miller