ansaurus

Question

Using MapMaker#makeComputingMap to prevent simultaneous RPCs for the same data

Answer 1

+3 A:

I'm not sure why you implement remove yourself, why not simply have weak or soft values and let the GC clean up for you?

new MapMaker().weakValues().makeComputingMap[Int, Long](computer)

Jed Wesley-Smith 2010-10-28 01:42:35

Or an expiration time of 1 ms or some such.

ColinD 2010-10-28 01:56:34

yeah, I thought about that, but weak values keeps it around until it isn't in use. Works like a weak Interner basically.

Jed Wesley-Smith 2010-10-28 01:59:20

I could use a 1ms timeout, but probably the overhead of the Map maintaining timeouts will be more than me just calling delete() on it?

Blair Zajac 2010-10-28 02:06:07

@Blair: Yeah, I suppose it probably would.

ColinD 2010-10-28 02:07:50

Blair, the overhead is that a Timer is maintained and a TimerTask created for each get. In practice it won't be all that large, but I'd profile before making a decision. That said, it doesn't seem very useful for such a small timeout. Wouldn't a soft/weak value be better?

Jed Wesley-Smith 2010-10-28 03:08:05

@Jed, I saw a chart of MapMaker and the count of additional objects for each weak/soft type and it adds to the overhead. If you know you're going to delete it, it seems better just to delete it yourself.

Blair Zajac 2010-10-28 23:18:20

Found the link: http://code.google.com/p/memory-measurer/wiki/ElementCostInDataStructures

Blair Zajac 2010-10-28 23:29:28

Blair, I'd be very surprised if those additional objects are going to add up to anything in your system as things live for such a short time. The real question you need to answer is whether you are going to ask the backend for the same value repeatedly without overlap. When you explicitly delete, the only thing you are protecting against is actually concurrent requests for the same value. Serial requests will _always_ re-request. As always with performance, only optimise when you actually see a problem, not when think one might be present – you're usually wrong :-)

Jed Wesley-Smith 2010-10-29 00:29:13

@Blair, this discussion reminded me that I forgot to include computing maps in that chart, so I added that, but only for the plain version of MapMaker, since it turns out, only in that case there is a difference. If you add expiring/eviction or special kind of references for keys or values, makeComputingMap() and makeMap() have the same cost.

Dimitris Andreou 2010-10-29 17:26:20

@Jed, regarding Timer+TimerTask: you are referring to the old implementation, which, thankfully, was thrown away. Expiration via TimerTask would imply a priority queue, which would further imply that the threads that are supposed to concurrently perform O(1) tasks, would suddenly compete for a map-wide lock, and adding insult to injury, they would hold that lock for a O(logn) operation.

Dimitris Andreou 2010-10-29 17:29:34

@Jed, we have measured and know for certain we have a slow backend. I have many interactive users not being able to work due it to. Also, I think using weak values could be dangerous, you don't know if some caller has stuck the value into an object that maintains a strong reference to it, so the only safe thing is to have it expire.

Blair Zajac 2010-10-30 01:24:54

@Dimitris, I presume that specifying a 0 expiration time to #expiration() would work? Nevermind, it considers it as an illegal argument.

Blair Zajac 2010-10-30 01:26:22

Answer 2

+1 A:

I think what you do is quite reasonable. You only use the structure to get lock-striping on the key, to ensure that accesses to the same key conflict. No worries that you don't need a value mapping per key. ConcurrentHashMap and friends is the only structure in Java libraries+Guava that offers you lock-striping.

This does induce some minor runtime overhead, plus the size of the hashtable which you don't need (which might even grow, if accesses to the same segment pile up and remove() doesn't keep up).

If you want to make it as cheap as possible, you could code some simple lock-striping yourself. Basically an Object[] (or Array[AnyRef] :)) of N locks (N = concurrency level), and you just map the hash of the lookup key into this array, and lock. Another advantage of this is that you really don't have to do hashcode tricks that CHM requires to do, because the latter has to split the hashcode in one part to select the lock, and another for the needs of the hashtable, but you can use the whole of it just for the lock selection.

edit: Sketching my comment below:

val concurrencyLevel = 16
val locks = (for (i <- 0 to concurrencyLevel) yield new AnyRef).toArray

def access(key: K): V = {
   val lock = locks(key.hashCode % locks.size)
   lock synchronized {
     val valueFromCache = cache.lookup(key)
     valueFromCache match {
       case Some(v) => return v
       case None =>
         val valueFromBackend = backendServer.lookup(key)
         cache.put(key, valueFromBackend)
         return valueFromBackend
     }
   }
}

(Btw, is the toArray call needed? Or the returned IndexSeq is already fast to access by index?)

Dimitris Andreou 2010-10-29 17:15:47

Just so I understand in the third paragraph, are you just suggesting locking on an Object[] just to ensure only one thread is making a request at a time? I am interested in getting the result back so that the blocking thread doesn't have to make the call, presumably within the N milliseconds to make the call, the same result would be returned by the backend.

Blair Zajac 2010-10-30 00:51:45

Hmm, right, I oversimplified a bit. How about you check the cache whether the value you want is already there _while keeping the right lock for the key_? In other words, if a thread locks, and doesn't find the value in the cache (hopefully that part is fast), it goes to the back-end server itself while blocking all those other threads that want the same key. When another thread that wanted that key takes the lock, first checks to see whether it came late and another thread already did the call. So the first thread gets the value from the back-end server, the rest only from the cache.

Dimitris Andreou 2010-10-30 01:39:05

I must admit that the current solution of using an expiring computing map is working fine without much additional coding and debugging, especially since I have unhappy users.

Blair Zajac 2010-10-30 01:41:30

Completely agree, that was a good, simple and reliable solution. (Though why did you switch to expiration? I thought doing remove() immediately was better. Are you also using the MapMaker as a kind of very short term cache? To protect the real cache from a flood of requests? )

Dimitris Andreou 2010-10-30 01:52:02

Yes, it is a short term cache. I'm using a 1 ns expiration. See my new comments above just under the question and this sample code here http://code.google.com/p/guava-libraries/issues/detail?id=462 why #remove() doesn't work. Weak values don't work in case some code stores the result in an object with longer lifetime, keeping a strong reference on it.

Blair Zajac 2010-10-30 02:25:59

Right. Well, if you put into the start of your function code to check whether the real cache has already got the value, it would work, wouldn't it? I guess you hesitate to check again the cache when the very reason you're doing this is because the cache didn't have the value moments ago.

Dimitris Andreou 2010-10-30 02:39:11

Is the "cache" variable a normal hash? Do entries expire or remove if the value is weak? Need some way of not using old values.

Blair Zajac 2010-10-30 02:48:00

"cache" is just memcached in this case; pseudocode, I'm not aware of the actual api. And I forgot to write to the cache the value from the backend server, fixing now.

Dimitris Andreou 2010-10-30 04:09:19

ansaurus

tags:

views:

answers:

Using MapMaker#makeComputingMap to prevent simultaneous RPCs for the same data

related questions