views:

190

answers:

3

Hello, I'm woking on Java with a large (millions) hashmap that is actually built with a capacity of 10.000.000 and a load factor of .75 and it's used to cache some values

since cached values become useless with time (not accessed anymore) but I can't remove useless ones while on the way I would like to entirely empty the cache when its performance starts to degrade. How can I decide when it's good to do it?

For example, with 10 millions capacity and .75 should I empty it when it reaches 7.5 millions of elements? Because I tried various threshold values but I would like to have an analytic one.

I've already tested the fact that emping it when it's quite full is a boost for perfomance (first 2-3 algorithm iterations after the wipe just fill it back, then it starts running faster than before the wipe)

EDIT: ADDITIONAL INFO

The hashmap has long as keys and float as values. It contains cached correlation of contents, since it's a dot product of tag vectors I wanted to cache them (to increase performance).

So basically what I do is to compute a long key using the hashcodes of the 2 contents:

static private long computeKey(Object o1, Object o2)
{
    int h1 = o1.hashCode();
    int h2 = o2.hashCode();

    if (h1 < h2)
    {
        int swap = h1;
        h1 = h2;
        h2 = swap;
    }

    return ((long)h1) << 32 | h2;
}

and use it to retrieve stored values. What happens is that since it's a hierarchical clustering contents are merged and their correlation values with other contents are not needed any more.. that's why I want to wipe the hashmap from time to time, to avoid degradation due to useless values inside it.

Using a WeakHashMap will unpredictably wipe out data also when they are still needed.. I've no control over it.

Thanks

+1  A: 

Have you investigated WeakHashMaps ? The garbage collector can determine when to remove stuff and it may give you an acceptable substitute rather than coding something yourself.

This article has more useful information.

Brian Agnew
+3  A: 

Why not use an LRU Cache? From Java's LinkedHashMap documentation:

A special constructor is provided to create a linked hash map whose order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order). This kind of map is well-suited to building LRU caches. Invoking the put or get method results in an access to the corresponding entry (assuming it exists after the invocation completes). The putAll method generates one entry access for each mapping in the specified map, in the order that key-value mappings are provided by the specified map's entry set iterator. No other methods generate entry accesses. In particular, operations on collection-views do not affect the order of iteration of the backing map.

So basically, every once in a while as your map gets too big, just delete the first x values that the iterator gives you.

See documentation for removeEldestEntry to have this done for you automatically.

Here is code that demonstrates:

 public static void main(String[] args) {
    class CacheMap extends LinkedHashMap{
      private int maxCapacity;
      public CacheMap(int initialCapacity, int maxCapacity) {
        super(initialCapacity, 0.75f, true);
        this.maxCapacity = maxCapacity;
      }

      @Override
      protected boolean removeEldestEntry(Map.Entry eldest) {
        return size()>maxCapacity;
      }
    }

    int[] popular = {1,2,3,4,5};
    CacheMap myCache = new CacheMap(5, 10);
    for (int i=0; i<100; i++){
      myCache.put(i,i);
      for (int p : popular) {
        myCache.get(p);
      }
    }

    System.out.println(myCache.toString()); 
    //{95=95, 96=96, 97=97, 98=98, 99=99, 1=1, 2=2, 3=3, 4=4, 5=5}
  }
z5h
According to latest informations given by Jack, I consider your LRU solution far better than my WeakHashMap. So I removed mine and plussed yours.
Riduidel
@Riduidel: Thanks.
z5h
+1  A: 

You may want to use Google Collections' MapMaker to make a map with soft references and a specific timeout.

Soft References "are cleared at the discretion of the garbage collector in response to memory demand."

Example:

ConcurrentMap<Long, ValueTypeHere> cacheMap = new MapMaker()
    .concurrencyLevel(32)
    .softValues()
    .expiration(30, TimeUnit.MINUTES)
    .makeMap();

You can also specify weakKeys if you want to make its keys act like ones in a WeakHashMap.

R. Bemrose