views:

363

answers:

3

I've got a class Foo with a property Id. My goal is that there are no two instances of Foo with the same Id at the same time.

So I created a factory method CreateFoo which uses a cache in order to return the same instance for the same Id.

static Foo CreateFoo(int id) {
    Foo foo;
    if (!cache.TryGetValue(id, out foo)) {
        foo = new Foo(id);
        foo.Initialize(...);
        cache.Put(id, foo);
    }
    return foo;
}

The cache is implemented as a Dictionary<TKey,WeakReference>, based on @JaredPar's Building a WeakReference Hashtable:

class WeakDictionary<TKey, TValue> where TValue : class {
    private readonly Dictionary<TKey, WeakReference> items;
    public WeakDictionary() {
        this.items = new Dictionary<TKey, WeakReference>();
    }
    public void Put(TKey key, TValue value) {
        this.items[key] = new WeakReference(value);
    }
    public bool TryGetValue(TKey key, out TValue value) {
        WeakReference weakRef;
        if (!this.items.TryGetValue(key, out weakRef)) {
            value = null;
            return false;
        } else {
            value = (TValue)weakRef.Target;
            return (value != null);
        }
    }
}

The problem is that the WeakReferences remain in the dictionary after their targets have been garbage collected. This implies the need for some strategy how to manually "garbage collect" dead WeakReferences, as explained by @Pascal Cuoq in What happens to a WeakReference after GC of WeakReference.Target.


My question is: What's the best strategy to compact a WeakReference Dictionary?

The options that I see are:

  1. Don't remove WeakReferences from the Dictionary. IMO this is bad, because the cache is used in the full lifetime of my application, and a lot of dead WeakReferences will accumulate over time.

  2. Walk the entire dictionary on each Put and TryGetValue, and remove dead WeakReferences. This defeats somewhat the purpose of a dictionary because both operations become O(n).

  3. Walk the entire dictionary periodically in a background thread. What would be a good interval, given that I don't know the usage pattern of CreateFoo?

  4. Append each inserted KeyValuePair to a double-ended linked list. Each call to Put and TryGetValue examines the head of the list. If the WeakReference is alive, move the pair to the end of the list. If it is dead, remove the pair from the list and the WeakReference from the Dictionary.

  5. Implement a custom hash table with the minor difference that, when a bucket is full, dead WeakReferences are first removed from the bucket before proceeding as usual.

Are there other strategies?

The best strategy is probably an algorithm with amortized time complexity. Does such a strategy exist?

+1  A: 

You could remove the "invalid" WeakReference inside TryGetValue:

[Edit] My mistake, these solutions actually do nothing more than what you suggested, since Put method will swap the old object with the new one anyway. Just ignore it.

public bool TryGetValue(TKey key, out TValue value) {
    WeakReference weakRef;
    if (!this.items.TryGetValue(key, out weakRef)) {
        value = null;
        return false;
    } else {
        value = (TValue)weakRef.Target;
        if (value == null)
            this.items.Remove(key);
        return (value != null);
    }
}

Or, you can immediatelly create a new instance inside your dictionary, whenever it is needed:

public TValue GetOrCreate(TKey key, Func<Tkey, TValue> ctor) {

    WeakReference weakRef;
    if (!this.items.TryGetValue(key, out weakRef) {
        Tvalue result = ctor(key);
        this.Put(key, result);
        return result;
    } 

    value = (TValue)weakRef.Target;
    if (value == null)
    {
        Tvalue result = ctor(key);
        this.Put(key, result);
        return result;
    }

    return value;
}

You would then use it like this:

static Foo CreateFoo(int id)
{
    return cache.GetOrCreate(id, id => new Foo(id));
}

[Edit]

According to windbg, WeakReference instance alone occupies 16 bytes. For 100,000 collected objects, this would not be such a serious burden, so you could easily let them live.

If this is a server app and you believe you could benefit from collecting, I would consider going for a background thread, but also implementing a simple algorithm to increase waiting time whenever you collect a relatively small number of objects.

Groo
In this case it will be re-inserted by `CreateFoo` immediately, so this is sadly no improvement.
dtb
You could do the scans to compact the dictionary when you remove a weak reference.
Kevin Brock
There is not only the dead WeakReference that occupies space, but also the internal structures of the Dictionary. So even if the number of active Foo objects stays constants, the hash table needs to constantly grow to accommodate all the dead WeakReferences.
dtb
Correct, I didn't take that into account. That would be additional 4 bytes (on a 32-bit system) for each array member (on average of course, there may be some small constant overhead).
Groo
It *is* an improvement, it doesn't grow.
Hans Passant
@nobugz: No, my mistake, actually the `Put` method will replace the invalid object after it's called inside `CreateFoo`.
Groo
+1  A: 

Your Option 3 (a Thread) has the big disadvantage of making synchronization necessary on all Put/TryGetvalue actions. If you do use this, your interval is not in milliseconds but every N TryGet actions.

Option 2, scanning the Dictionary, would incur a serious overhead. You can improve by only scanning 1 in 1000 actions and/or by watching how often the GC has run.

But i would seriously consider option 1: Do nothing. You may have "a lot" of dead entries but on the other hand they are pretty small (and get recycled). Probably not an option for a Server App but for a Client application I would try to get a measure on how many entries (kByte) per hour we are talking about.

After some discussion:

Does such a[n amortized] strategy exist?

I would guess no. Your problem is a miniature version of the GC. You will have to scan the whole thing once in a while. So only options 2) and 3) provide a real solution. And they are both expensive but they can be (heavily) optimized with some heuristics. Option 2) would still give you the occasional worst-case though.

Henk Holterman
I have added two more options. What do you think of those? ... The problem is that I'm building a library that could be used both in server and client apps. In both cases it's likely that my class is used for the entire lifetime of the app. It feels wrong to design a library that acquires memory with each method call and never releases it, even if it's just a a couple of bytes per call.
dtb
You're right about the library angle. Both new options will probably work but have really bad worst-case behaviour. I vote for nr 2, periodical full scan.
Henk Holterman
Why do you think option 5 has a really bad worst-case behaviour? A hash table implementation will always have to scan the buckets for an empty spot, so why no check for dead WeakReferences here?
dtb
It won't scan _all_ buckets. It's about the same as checking a few random entries. Underlying problem is that there is no correlation with what the GC cleans up and what you can easily access.
Henk Holterman
A: 

I had this same problem, and solved it like this (WeakDictionary is the class I was trying to clean up):

internal class CleanerRef
{
    ~CleanerRef()
    {
        if (handle.IsAllocated)
            handle.Free();
    }

    public CleanerRef(WeakDictionaryCleaner cleaner, WeakDictionary dictionary)
    {
        handle = GCHandle.Alloc(cleaner, GCHandleType.WeakTrackResurrection);
        Dictionary = dictionary;
    }

    public bool IsAlive
    {
        get {return handle.IsAllocated && handle.Target != null;}
    }

    public object Target
    {
        get {return IsAlive ? handle.Target : null;}
    }

    GCHandle handle;
    public WeakDictionary Dictionary;
}


internal class WeakDictionaryCleaner
{
    public WeakDictionaryCleaner(WeakDictionary dict)
    {
        refs.Add(new CleanerRef(this, dict));
    }

    ~WeakDictionaryCleaner()
    {
        foreach(var cleanerRef in refs)
        {
            if (cleanerRef.Target == this)
            {
                cleanerRef.Dictionary.ClearGcedEntries();
                refs.Remove(cleanerRef);
                break;
            }
        }
    }
    private static readonly List<CleanerRef> refs = new List<CleanerRef>();
}

What this two classes try to achieve is to "hook" the GC. You activate this mechanism by creating an instance of WeakDictionaryCleaner during the construction of the weak collection:

new WeakDictionaryCleaner(weakDictionary);

Notice that I don't create any reference to the new instance, so that the GC will dispose it during the next cycle. In the ClearGcedEntries() method I create a new instance again, so that each GC cycle will have a cleaner to finalize that in turn will execute the collection compaction. You can make the CleanerRef.Dictionary also a weak reference so that it won't hold the dictionary in memory.

Hope this helps

Juan Vidal Pich