tags:

views:

4942

answers:

8

Please don't say EHCache or OSCache, etc. Assume for purposes of this question that I want to implement my own using just the SDK (learning by doing). Given that the cache will be used in a multithreaded environment, which datastructures would you use? I've already implemented one using LinkedHashMap and Collections#synchronizedMap, but I'm curious if any of the new concurrent collections would be better candidates.

UPDATE: I was just reading through Yegge's latest when I found this nugget:

If you need constant-time access and want to maintain the insertion order, you can't do better than a LinkedHashMap, a truly wonderful data structure. The only way it could possibly be more wonderful is if there were a concurrent version. But alas.

I was thinking almost exactly the same thing before I went with the LinkedHashMap + Collections#synchronizedMap implementation I mentioned above. Nice to know I hadn't just overlooked something.

Based on the answers so far, it sounds like my best bet for a highly concurrent LRU would be to extend ConcurrentHashMap using some of the same logic that LinkedHashMap uses.

+2  A: 

I would consider using java.util.concurrent.PriorityBlockingQueue, with priority determined by a "numberOfUses" counter in each element. I would be very, very careful to get all my synchronisation correct, as the "numberOfUses" counter implies that the element can't be immutable.

The element object would be a wrapper for the objects in the cache:

class CacheElement {
    private final Object obj;
    private int numberOfUsers = 0;

    CacheElement(Object obj) {
        this.obj = obj;
    }

    ... etc.
}
Steve McLeod
dont you mean must be immutable?
shsteimer
no, must be mutable, because the numberOfUses changes
Steve McLeod
James below points an error I made. Which I offer as evidence as to just how bleeding difficult it is to write reliable, sturdy caches.
Steve McLeod
+1  A: 

Have a look at ConcurrentSkipListMap. It should give you log(n) time for testing and removing an element if it is already contained in the cache, and constant time for re-adding it.

You'd just need some counter etc and wrapper element to force ordering of the LRU order and ensure recent stuff is discarded when the cache is full.

madlep
Would `ConcurrentSkipListMap` provide some ease-of-implementation benefit over `ConcurrentHashMap`, or is it simply a case of avoiding pathological cases?
Hank Gay
It would make things simpler, as ConcurrentSkipListMap orders elements, which would allow you to manage what order things were used in. ConcurrentHashMap doesn't do this, so you'd basically have to iterate over the entire cache contents to update an elements's 'last used counter' or whatever
madlep
So with the `ConcurrentSkipListMap` implementation, I would create a new implementation of the `Map` interface that delegates to `ConcurrentSkipListMap` and performs some sort of wrapping so that the arbitrary key types are wrapped in a type that is easily sorted based on the last access?
Hank Gay
+3  A: 

note that if you attempt to do the priorityblockingqueue version mentioned by steve mcleod, you should make the element immutable, because modifying the element while in the queue will have no affect, you will need to remove the element and re-add it in order to re-prioritize it.

james
A: 

Well for a cache you will generally be looking up some piece of data via a proxy object, (a URL, String....) so interface-wise you are going to want a map. but to kick things out you want a queue like structure. Internally I would maintain two data structures, a Priority-Queue and a HashMap. heres an implementation that should be able to do everything in O(1) time.

heres a class i whipped up pretty quick:

import java.util.HashMap;
import java.util.Map;
public class LRUCache<K, V>
{
    int maxSize;
    int currentSize = 0;

    Map<K, ValueHolder<K, V>> map;
    LinkedList<K> queue;

    public LRUCache(int maxSize)
    {
     this.maxSize = maxSize;
     map = new HashMap<K, ValueHolder<K, V>>();
     queue = new LinkedList<K>();
    }

    private void freeSpace()
    {
     K k = queue.remove();
     map.remove(k);
     currentSize--;
    }

    public void put(K key, V val)
    {
     while(currentSize >= maxSize)
     {
      freeSpace();
     }
     if(map.containsKey(key))
     {//just heat up that item
      get(key);
      return;
     }
     ListNode<K> ln = queue.add(key);
     ValueHolder<K, V> rv = new ValueHolder<K, V>(val, ln);
     map.put(key, rv);  
     currentSize++;
    }

    public V get(K key)
    {
     ValueHolder<K, V> rv = map.get(key);
     if(rv == null) return null;
     queue.remove(rv.queueLocation);
     rv.queueLocation = queue.add(key);//this ensures that each item has only one copy of the key in the queue
     return rv.value;
    }
}

class ListNode<K>
{
    ListNode<K> prev;
    ListNode<K> next;
    K value;
    public ListNode(K v)
    {
     value = v;
     prev = null;
     next = null;
    }
}

class ValueHolder<K,V>
{
    V value;
    ListNode<K> queueLocation;
    public ValueHolder(V value, ListNode<K> ql)
    {
     this.value = value;
     this.queueLocation = ql;
    }
}

class LinkedList<K>
{
    ListNode<K> head = null;
    ListNode<K> tail = null;

    public ListNode<K> add(K v)
    {
     if(head == null)
     {
      assert(tail == null);
      head = tail = new ListNode<K>(v);
     }
     else
     {
      tail.next = new ListNode<K>(v);
      tail.next.prev = tail;
      tail = tail.next;
      if(tail.prev == null)
      {
       tail.prev = head;
       head.next = tail;
      }
     }
     return tail;
    }

    public K remove()
    {
     if(head == null)
      return null;
     K val = head.value;
     if(head.next == null)
     {
      head = null;
      tail = null;
     }
     else
     {
      head = head.next;
      head.prev = null;
     }
     return val;
    }

    public void remove(ListNode<K> ln)
    {
     ListNode<K> prev = ln.prev;
     ListNode<K> next = ln.next;
     if(prev == null)
     {
      head = next;
     }
     else
     {
      prev.next = next;
     }
     if(next == null)
     {
      tail = prev;
     }
     else
     {
      next.prev = prev;
     }  
    }
}

Heres how it works. Keys are stored in a linked list with the oldest keys in the front of the list (new keys go to the back) so when you need to 'eject' something you just pop it off the front of the queue and then use the key to remove the value from the map. when an item gets referenced you grab the ValueHolder from the map and then use the queuelocation variable to remove the key from its current location in the queue and then put it at the back of the queue (its now the most recently used). adding things is pretty much the same.

I'm sure theres a ton of errors here and i haven't implemented any synchronization. but this class will provide O(1) adding to the cache, O(1) removal of old items, and O(1) retrieval of cache items. Even a trivial synchronization (just synchronize every public method) would still have little lock contention due to the run time. If anyone has any clever synchronization tricks I would be very interested. also im sure there are some additional optimizations that you could implement using the maxsize variable with respect to the map.

luke
Thanks for the level of detail, but where does this provide a win over the `LinkedHashMap` + `Collections.synchronizedMap()` implementation?
Hank Gay
Performance, I don't know for sure, but I don't think LinkedHashMap has O(1) insertion (its probably O(log(n))), actually you could add a few methods to complete the map interface in my implementation and then use Collections.synchronizedMap to add concurrency.
luke
In LinkedList class above in the add method there is a code in else block i.e.if(tail.prev == null){ tail.prev = head; head.next = tail;}When this code will be executed? I ran few dry runs and I think this will never be executed and should be removed.
Dipesh
+3  A: 

LinkedHashMap is O(1), but requires synchronization. No need to reinvent the wheel there.

2 options for increasing concurrency:

1. Create multiple LinkedHashMaps, and hash into them: example: LinkedHashMap[4], index 0, 1, 2, 3. On the key do key%4 (or binary OR on [key, 3]) to pick which map to do a put/get/remove.

2. You could do an 'almost' LRU by extending ConcurrentHashMap, and having a linked hash map like structure in each of the regions inside of it. Locking would occur more granularly than a LinkedHashMap that is synchronized. On a put or putIfAbsent only a lock on the head and tail of the list is needed (per region). On a remove or get the whole region needs to be locked. I'm curious if Atomic linked lists of some sort might help here -- probably so for the head of the list. Maybe for more.

The structure would not keep the total order, but only the order per region. As long as the number of entries is much larger than the number of regions, this is good enough for most caches. Each region will have to have its own entry count, this would be used rather than the global count for the eviction trigger. The default number of regions in a ConcurrentHashMap is 16, which is plenty for most servers today.

  1. would be easier to write and faster under moderate concurrency.

  2. would be more difficult to write but scale much better at very high concurrency. It would be slower for normal access (just as ConcurrentHashMap is slower than HashMap where there is no concurrency)

Scott
+1  A: 

I like lots of these suggestions, but for now I think I'll stick with LinkedHashMap + Collections.synchronizedMap. If I do revisit this in the future, I'll probably work on extending ConcurrentHashMap in the same way LinkedHashMap extends HashMap.

UPDATE:

By request, here's the gist of my current implementation.

private class LruCache<A, B> extends LinkedHashMap<A, B> {
    private final int maxEntries;

    public LruCache(final int maxEntries) {
        super(maxEntries + 1, 1.0f, true);
        this.maxEntries = maxEntries;
    }

    /**
     * Returns <tt>true</tt> if this <code>LruCache</code> has more entries than the maximum specified when it was
     * created.
     *
     * <p>
     * This method <em>does not</em> modify the underlying <code>Map</code>; it relies on the implementation of
     * <code>LinkedHashMap</code> to do that, but that behavior is documented in the JavaDoc for
     * <code>LinkedHashMap</code>.
     * </p>
     *
     * @param eldest
     *            the <code>Entry</code> in question; this implementation doesn't care what it is, since the
     *            implementation is only dependent on the size of the cache
     * @return <tt>true</tt> if the oldest
     * @see java.util.LinkedHashMap#removeEldestEntry(Map.Entry)
     */
    @Override
    protected boolean removeEldestEntry(final Map.Entry<A, B> eldest) {
        return super.size() > maxEntries;
    }
}

Map<String, String> example = Collections.synchronizedMap(new LruCache<String, String>(CACHE_SIZE));
Hank Gay
+3  A: 

There are two open source implementations.

Apache Solr has ConcurrentLRUCache: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java?view=log

There's an open source project for a ConcurrentLinkedHashMap: http://code.google.com/p/concurrentlinkedhashmap/

Ron
Solr's solution isn't actually LRU, but the `ConcurrentLinkedHashMap` is interesting. It claims to have been rolled into `MapMaker` from Guava, but I didn't spot it in the docs. Any idea what's going with that effort?
Hank Gay
A simplified version was integrated, but the tests have not been complete so it is not yet public. I had a lot of problems doing a deeper integration, but I do hope to finish it as there are some nice algorithmic properties. The ability to listen to an eviction (capacity, expiration, GC) was added and is based on CLHM's approach (listener queue). I would also like to contribute the idea of "weighted values" as well, since that is useful when caching collections. Unfortunately due to other commitments I have been too swamped to devote the time Guava deserves (and that I promised Kevin/Charles).
Ben Manes
Update: The integration was completed and public in Guava r08. This through the #maximumSize() setting.
Ben Manes
A: 

Hi Hank, I'm looking for a batter LRU cache java code. Is it possible you to share your LRU cache java code using LinkedHashMap and Collections#synchronizedMap? Currently i'm using LRUMap implements Map java code and the code works fine and but i'm getting ArrayIndexOutofBoundException on load testing using 500 users on the below method. The method moves the recent object to front of the queue.

private void moveToFront(int index) {
        if (listHead != index) {
            int thisNext = nextElement[index];
            int thisPrev = prevElement[index];
            nextElement[thisPrev] = thisNext;
            if (thisNext >= 0) {
                prevElement[thisNext] = thisPrev;
            } else {
                listTail = thisPrev;
            }
            //old listHead and new listHead say new is 1 and old was 0 then prev[1]= 1 is the head now so no previ so -1
            // prev[0 old head] = new head right ; next[new head] = old head
            prevElement[index] = -1;
            nextElement[index] = listHead;
            prevElement[listHead] = index;
            listHead = index;
        }
    }

get(Object key) and put(Object key, Object value) method calls the above moveToFront method.

Raj Pandian