views:

131

answers:

2

We have a slow backend server that is getting crushed by load and we'd like the middle-tier Scala server to only have one outstanding request to the backend for each unique lookup.

The backend server only stores immutable data, but upon the addition of new data, the middle-tier servers will request the newest data on behalf of the clients and the backend server has a hard time with the load. The immutable data is cached in memcached using unique keys generated upon the write, but the write rate is high so we get a low memcached hit rate.

One idea I have is to use Google Guava's MapMaker#makeComputingMap() to wrap the actual lookup and after ConcurrentMap#get() returns, the middle-tier will save the result and just delete the key from the Map.

This seems a little wasteful, although the code is very easy to write, see below for an example of what I'm thinking.

Is there a more natural data structure, library or part of Guava that would solve this problem?

import com.google.common.collect.MapMaker

object Test
{
  val computer: com.google.common.base.Function[Int,Long] =
  {
    new com.google.common.base.Function[Int,Long] {
      override
      def apply(i: Int): Long =
      {
        val l = System.currentTimeMillis + i
        System.err.println("For " + i + " returning " + l)
        Thread.sleep(2000)
        l
      }
    }
  }

  val map =
  {
    new MapMaker().makeComputingMap[Int,Long](computer)
  }

  def get(k: Int): Long =
  {
    val l = map.get(k)
    map.remove(k)
    l
  }

  def main(args: Array[String]): Unit =
  {
    val t1 = new Thread() {
      override def run(): Unit =
      {
        System.err.println(get(123))
      }
    }

    val t2 = new Thread() {
      override def run(): Unit =
      {
        System.err.println(get(123))
      }
    }

    t1.start()
    t2.start()
    t1.join()
    t2.join()

    System.err.println(get(123))
  }
}
+3  A: 

I'm not sure why you implement remove yourself, why not simply have weak or soft values and let the GC clean up for you?

new MapMaker().weakValues().makeComputingMap[Int, Long](computer)
Jed Wesley-Smith
Or an expiration time of 1 ms or some such.
ColinD
yeah, I thought about that, but weak values keeps it around until it isn't in use. Works like a weak Interner basically.
Jed Wesley-Smith
I could use a 1ms timeout, but probably the overhead of the Map maintaining timeouts will be more than me just calling delete() on it?
Blair Zajac
@Blair: Yeah, I suppose it probably would.
ColinD
Blair, the overhead is that a Timer is maintained and a TimerTask created for each get. In practice it won't be all that large, but I'd profile before making a decision. That said, it doesn't seem very useful for such a small timeout. Wouldn't a soft/weak value be better?
Jed Wesley-Smith
@Jed, I saw a chart of MapMaker and the count of additional objects for each weak/soft type and it adds to the overhead. If you know you're going to delete it, it seems better just to delete it yourself.
Blair Zajac
Found the link: http://code.google.com/p/memory-measurer/wiki/ElementCostInDataStructures
Blair Zajac
Blair, I'd be very surprised if those additional objects are going to add up to anything in your system as things live for such a short time. The real question you need to answer is whether you are going to ask the backend for the same value repeatedly without overlap. When you explicitly delete, the only thing you are protecting against is actually concurrent requests for the same value. Serial requests will _always_ re-request. As always with performance, only optimise when you actually see a problem, not when think one might be present – you're usually wrong :-)
Jed Wesley-Smith
@Blair, this discussion reminded me that I forgot to include computing maps in that chart, so I added that, but only for the plain version of MapMaker, since it turns out, only in that case there is a difference. If you add expiring/eviction or special kind of references for keys or values, makeComputingMap() and makeMap() have the same cost.
Dimitris Andreou
@Jed, regarding Timer+TimerTask: you are referring to the old implementation, which, thankfully, was thrown away. Expiration via TimerTask would imply a priority queue, which would further imply that the threads that are supposed to concurrently perform O(1) tasks, would suddenly compete for a map-wide lock, and adding insult to injury, they would hold that lock for a O(logn) operation.
Dimitris Andreou
@Jed, we have measured and know for certain we have a slow backend. I have many interactive users not being able to work due it to. Also, I think using weak values could be dangerous, you don't know if some caller has stuck the value into an object that maintains a strong reference to it, so the only safe thing is to have it expire.
Blair Zajac
@Dimitris, I presume that specifying a 0 expiration time to #expiration() would work? Nevermind, it considers it as an illegal argument.
Blair Zajac
+1  A: 

I think what you do is quite reasonable. You only use the structure to get lock-striping on the key, to ensure that accesses to the same key conflict. No worries that you don't need a value mapping per key. ConcurrentHashMap and friends is the only structure in Java libraries+Guava that offers you lock-striping.

This does induce some minor runtime overhead, plus the size of the hashtable which you don't need (which might even grow, if accesses to the same segment pile up and remove() doesn't keep up).

If you want to make it as cheap as possible, you could code some simple lock-striping yourself. Basically an Object[] (or Array[AnyRef] :)) of N locks (N = concurrency level), and you just map the hash of the lookup key into this array, and lock. Another advantage of this is that you really don't have to do hashcode tricks that CHM requires to do, because the latter has to split the hashcode in one part to select the lock, and another for the needs of the hashtable, but you can use the whole of it just for the lock selection.

edit: Sketching my comment below:

val concurrencyLevel = 16
val locks = (for (i <- 0 to concurrencyLevel) yield new AnyRef).toArray

def access(key: K): V = {
   val lock = locks(key.hashCode % locks.size)
   lock synchronized {
     val valueFromCache = cache.lookup(key)
     valueFromCache match {
       case Some(v) => return v
       case None =>
         val valueFromBackend = backendServer.lookup(key)
         cache.put(key, valueFromBackend)
         return valueFromBackend
     }
   }
}

(Btw, is the toArray call needed? Or the returned IndexSeq is already fast to access by index?)

Dimitris Andreou
Just so I understand in the third paragraph, are you just suggesting locking on an Object[] just to ensure only one thread is making a request at a time? I am interested in getting the result back so that the blocking thread doesn't have to make the call, presumably within the N milliseconds to make the call, the same result would be returned by the backend.
Blair Zajac
Hmm, right, I oversimplified a bit. How about you check the cache whether the value you want is already there _while keeping the right lock for the key_? In other words, if a thread locks, and doesn't find the value in the cache (hopefully that part is fast), it goes to the back-end server itself while blocking all those other threads that want the same key. When another thread that wanted that key takes the lock, first checks to see whether it came late and another thread already did the call. So the first thread gets the value from the back-end server, the rest only from the cache.
Dimitris Andreou
I must admit that the current solution of using an expiring computing map is working fine without much additional coding and debugging, especially since I have unhappy users.
Blair Zajac
Completely agree, that was a good, simple and reliable solution. (Though why did you switch to expiration? I thought doing remove() immediately was better. Are you also using the MapMaker as a kind of very short term cache? To protect the real cache from a flood of requests? )
Dimitris Andreou
Yes, it is a short term cache. I'm using a 1 ns expiration. See my new comments above just under the question and this sample code here http://code.google.com/p/guava-libraries/issues/detail?id=462 why #remove() doesn't work. Weak values don't work in case some code stores the result in an object with longer lifetime, keeping a strong reference on it.
Blair Zajac
Right. Well, if you put into the start of your function code to check whether the real cache has already got the value, it would work, wouldn't it? I guess you hesitate to check again the cache when the very reason you're doing this is because the cache didn't have the value moments ago.
Dimitris Andreou
Is the "cache" variable a normal hash? Do entries expire or remove if the value is weak? Need some way of not using old values.
Blair Zajac
"cache" is just memcached in this case; pseudocode, I'm not aware of the actual api. And I forgot to write to the cache the value from the backend server, fixing now.
Dimitris Andreou