views:

350

answers:

6

I use a large (millions) entries hashmap to cache values needed by an algorithm, the key is a combination of two objects as a long. Since it grows continuously (because keys in the map changes, so old ones are not needed anymore) it would be nice to be able to force wiping all the data contained in it and start again during the execution, is there a way to do effectively in Java?

I mean release the associated memory (about 1-1.5gb of hashmap) and restart from the empty hashmap..

A: 

Have you taken a look at WeakHashMap?

TofuBeer
WeakHashMap works with object identity. Not sure it would do any good with Long keys.
Thilo
Yes, I mentioned this doubt in my post too.
polygenelubricants
+2  A: 

You can call HashMap.clear(). That will remove all data. Note that this will only discard all entries, but keep the internal array used to store the entries at the same size (rather than shrinking to an initial capacity). If you also need to eliminate that, the easiest way would be to discard the whole HashMap and replace it with a new instance. That of course only works if you control who has a pointer to the map.

As for reclaiming the memory, you will have to let the garbage collector do its work.

Are your values also Long? In this case, you may want to look at a more (memory-) efficient implementation than the generic HashMap, such as the TLongLongHashMap found in the GNU Trove library. That should save a lot of memory.

Thilo
+3  A: 

It sounds like you need a WeakHashMap instead:

A hashtable-based Map implementation with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. More precisely, the presence of a mapping for a given key will not prevent the key from being discarded by the garbage collector, that is, made finalizable, finalized, and then reclaimed. When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently from other Map implementations.

I'm not sure how this works with Long as keys, though. Also, this might be of interest:

WeakHashMap is not a cache! Understanding WeakReference and SoftReference

polygenelubricants
I am not so sure if that really works with Long keys. After all, the same Long instance is maybe used in completely unrelated code, and more importantly, the key Long could also be stored as a separate instance or a primitive instead, so that the reference would not be kept alive (even though still active).
Thilo
A: 

Clear the hashmap:

hashmap.clear();

Then force a garbage collector run:

Runtime.getRuntime().gc();

This is the Javadoc page for Runtime.gc().

zneak
I think calling `gc()` explicitly is never recommended, but I could be wrong.
polygenelubricants
@polygenelubricants: Javadoc says it's equivalent to `System.gc()` (altough it's the other way around, `System.gc()` says it's equivalent to `Runtime.getRuntime().gc()`). There's no explicit note of one of them being unrecommended. Javadoc link posted along my message.
zneak
@zneak: I think it is bad practice to call gc explicitly. You should leave that up the JVM.
Thilo
@Thilo: obviously, running the garbage collector stops the world from moving, and because of that it's a bad idea since the JVM is usually more aware than you when it needs more memory. That aside, under exceptional circumstances, if the goal is to reclaim 1.5GB of memory, I'd give it a go. Especially since it's going to be called in a very deterministic fashion (when the hashmap is cleared).
zneak
"obviously, running the garbage collector stops the world from moving". That is not (always) true. Most of garbage collection can happen concurrently now. And there is no guarantee what kind of collection gc() will do (if any). I think on the current Hotspot JVM, you have to call gc() three or four times in a row to force a full collection. And again, why do you want to reclaim memory immediately? The JVM will reclaim it for you when you need the memory.
Thilo
Because, Thilo, the JVM may reclaim it at a time when it is bad for your program. All the intelligent algorithms in the world still don't know when your code is in a state to cheerfully recover all the RAM in the world and when it might be in something speed- or response-sensitive.In general I'm in favour of having a way to force garbage collection in a language no matter how "unrecommended" it may be because the implementation doesn't know my code nor my requirements.
JUST MY correct OPINION
It's worth noting that `gc()` doesn't run the GC. It merely gives a hint to the JVM saying "hey, you should run GC, mate"
Valentin Rocher
@Thilo: the JVM is aware of itself, but not of other processes. It might look perfectly all right for the JVM to take up 1.5GB, even though most of which is garbage, but that may not be acceptable to my system. That being said, saying that garbage collection doesn't stop the world anymore doesn't sound like an argument against direct calls to garbage collection. I'm ready to believe that it's a bad practice, but can you explain why? Most of the articles I read about it told it was precisely because of this tendency that it was a bad practice.
zneak
"It might look perfectly all right for the JVM to take up 1.5GB, but that may not be acceptable to my system". Does a garbage collection actually return memory to the OS (rather than just recycling it in the same JVM)? That would be cool.
Thilo
@Thilo: I sure hope it does give back some of the memory, especially if it collects chunks as large as 1.5GB. A garbage collector that returns resources to the OS only once the process leaves is only half useful, is it not? But, truth to be told, I have no clue. Now again, if garbage collection doesn't stop the world anymore, and if (according to Valentin) it's now just a hint, why is it a bad practice?
zneak
@zneak: Re: bad practice. You should open a question about that here :-)
Thilo
@Thilo: http://stackoverflow.com/questions/2414105/why-is-it-a-bad-practice-to-call-system-gc I'll be looking forwards your answer.
zneak
A: 

If you have a bit of spare memory you could implement a timout cache where each value in the hashmap contains your long value and an insersion timestamp in millis - then have a background thread iterate over the values every X seconds and remove anything more than X seconds/millis old.

Just my 2 cents :)

simonlord
A: 

For a memory-aware cache, you may want to use Apache Commons collections, in particular their org.apache.commons.collections.map.ReferenceMap class. The Java special operation is a soft reference. Java provides WeakHashMap for weak references, but weak references are not what you want for a cache. Java does not provide a SoftHashMap, but ReferenceMap from Apache Commons can be a workable substitute.

Memory awareness of soft references is somewhat crude and inflexible. You can play with some Java options to somehow configure them, especially the -XX:SoftRefLRUPolicyMSPerMB value, which expresses (in milliseconds) how long soft-referenced values are kept in memory (when they cease to be directly reachable). For instance, with this:

java -XX:SoftRefLRUPolicyMSPerMB=2500

then the JVM will try to keep cached value for 2.5 seconds more than what it would have done with a WeakHashMap.

If soft references do not provide what you are looking for, then you will have to implement your own cache strategy, and, indeed, flush the map manually. This is your initial question. For flushing, you can use the clear() method, or simply create a new HashMap. The difference should be slight, and you may even have trouble simply measuring that difference.

Alternating between "full cache" and "empty cache" may also be considered as a bit crude, so you could maintain several maps. For instance, you maintain ten maps. When you look for a cached value, you look in all maps, but when you had a value, you put it in the first map only. When you want to flush, you rotate the maps: the first map becomes the second, the second becomes the third, and so on, up to the tenth map which is discarded. A new fresh first map is created. This would look like this:

import java.util.*;

public class Cache {

    private static final int MAX_SIZE = 500000;

    private Map[] backend;
    private int size = 0;

    public Cache(int n)
    {
        backend = new Map[n];
        for (int i = 0; i < n; i ++)
            backend[i] = new HashMap();
    }

    public int size()
    {
        return size;
    }

    public Object get(Object key)
    {
        for (Map m : backend) {
            if (m.containsKey(key))
                return m.get(key);
        }
        return null;
    }

    public Object put(Object key, Object value)
    {
        if (backend[0].containsKey(key))
            return backend[0].put(key, value);
        int n = backend.length;
        for (int i = 1; i < n; i ++) {
            Map m = backend[i];
            if (m.containsKey(key)) {
                Object old = m.remove(key);
                backend[0].put(key, value);
                return old;
            }
        }
        backend[0].put(key, value);
        size ++;
        while (size > MAX_SIZE) {
            size -= backend[n - 1].size();
            System.arraycopy(backend, 0, backend, 1, n - 1);
            backend[0] = new HashMap();
        }
        return null;
    }
}

The code above is completely untested, and should be enhanced with generics. However, it illustrates the main ideas: all maps are tested when reading (get()), all new values go to the first map, the total size is maintained, and when the size exceeds a given limit, maps are rotated. Note that there is some special treatment when a new value is put for a known key. Also, in this version, nothing special is done when finding a cached value, but we could "rejuvenate" accessed cached value: upon get(), when a value is found but not in the first map, it could be moved into the first map. Thus, frequently accessed values would remain cached forever.

Thomas Pornin
"The difference should be slight, and you may even have trouble simply measuring that difference." The difference is the array that holds the map entries (with is retained on clear()). With millions of entries, the array will be several MB (I think an Object[] takes roughly four bytes per entry)
Thilo
The array size is quite smaller than the collective size of the pointed-to objects. Besides, if that is a cache that you regularly empty, then by definition you will fill it again, and the big array *will* be back at some point. Using `clear()` means that you keep the array around, but avoid reallocating it later, and also avoid some level of rehashing when the internal array grows. I do not think it makes a measurable difference in the long term.
Thomas Pornin