views:

434

answers:

7

I have a question about improving the efficiency of my program. I have a Dictionary<string, Thingey> defined to hold named Thingeys. This is a web application that will create multiple named Thingey’s over time. Thingey’s are somewhat expensive to create (not prohibitively so) but I’d like to avoid it whenever possible. My logic for getting the right Thingey for the request looks a lot like this:

    private Dictionary<string, Thingey> Thingeys;
    public Thingey GetThingey(Request request)
    {
        string thingeyName = request.ThingeyName;
        if (!this.Thingeys.ContainsKey(thingeyName))
        {
            // create a new thingey on 1st reference
            Thingey newThingey = new Thingey(request);
            lock (this.Thingeys)
            {
                if (!this.Thingeys.ContainsKey(thingeyName))
                {
                    this.Thingeys.Add(thingeyName, newThingey);
                }
                // else - oops someone else beat us to it
                // newThingey will eventually get GCed
            }
        }

        return this. Thingeys[thingeyName];
    }

In this application, Thingeys live forever once created. We don’t know how to create them or which ones will be needed until the app starts and requests begin coming in. The question I have is in the above code is there are occasional instances where newThingey is created because we get multiple simultaneous requests for it before it’s been created. We end up creating 2 of them but only adding one to our collection. Is there a better way to get Thingeys created and added that doesn’t involve check/create/lock/check/add with the rare extraneous thingey that we created but end up never using? (And this code works and has been running for some time. This is just the nagging bit that has always bothered me.)

I'm trying to avoid locking the dictionary for the duration of creating a Thingey.

+2  A: 

If you're looking to avoid blocking unrelated threads, then additional work is needed (and should only be necessary if you've profiled and found that performance is unacceptable with the simpler code). I would recommend using a lightweight wrapper class that asynchronously creates a Thingey and using that in your dictionary.

Dictionary<string, ThingeyWrapper> thingeys = new Dictionary<string, ThingeyWrapper>();

private class ThingeyWrapper
{
    public Thingey Thing { get; private set; }

    private object creationLock;
    private Request request;

    public ThingeyWrapper(Request request)
    {
        creationFlag = new object();
        this.request = request;
    }

    public void WaitForCreation()
    {
        object flag = creationFlag;

        if(flag != null)
        {
            lock(flag)
            {
                if(request != null) Thing = new Thingey(request);

                creationFlag = null;

                request = null;
            }
        }
    }
}

public Thingey GetThingey(Request request)
{
    string thingeyName = request.ThingeyName;

    ThingeyWrapper output;

    lock (this.Thingeys)
    {
        if(!this.Thingeys.TryGetValue(thingeyName, out output))
        {
            output = new ThingeyWrapper(request);

            this.Thingeys.Add(thingeyName, output);
        }
    }

    output.WaitForCreation();

    return output.Thing;
}

While you are still locking on all calls, the creation process is much more lightweight.

Edit

This issue has stuck with me more than I expected it to, so I whipped together a somewhat more robust solution that follows this general pattern. You can find it here.

Adam Robinson
I don't want to lock while a new Thingey is being created.
No Refunds No Returns
We have very similar code - except I prefer to lock on a separate object :)
Jon Skeet
Any reason for that, Jon? I've always preferred locking on the object in question (assuming that I'm handling all locking internally, and that it's a reference type, of course) since it doesn't require that I "bundle" multiple objects together.
Adam Robinson
Since the collection variable is private and never given out, I felt locking it was safe.
No Refunds No Returns
@Adam - the reason is that it is a global lock. Global locks can cause performance problems under heavy load
mfeingold
@mfeingold: That may well be true, but I'm asking Jon why he prefers locking on a different instance. You're still locking in all of the same circumstances, it's just on a different instance.
Adam Robinson
@Adam: Uh, in my opinion anything with a WaitHandle (which is an OS handle) cannot be called lightweight at all.
Lucero
@Adam: I don't like locking on a reference that any other code could lock on. For example, the dictionary code might do locking on itself. That's *probably* not a problem - but I like to know that *nothing* else is going to take out locks on something I'm locking on. Btw, I like your implementation. Somewhat tidier than mine :)
Jon Skeet
@Lucero: The `WaitHandle` isn't lightweight when compared to, say, an `object` (and it may be a viable option to convert this to use a `Monitor`, I just didn't in this example), but it may well be "lightweight" compared to a `Thingey`
Adam Robinson
I think you want to make it a ManualResetEvent though - otherwise after the first return it'll block forever...
Jon Skeet
@Jon: You're right, thanks; I've corrected that and added an alternative `Monitor` based approach, though it's not as safe as the `WaitHandle` in this implementation.
Adam Robinson
@Adam: No, this implementation isn't thread-safe. A second thread could obtain the lock before the thread-pool obtains it. Why bother with the thread pool at all though?
Jon Skeet
@Jon: That was the caveat I mentioned in the post, though I suppose that it's not terribly helpful if it isn't thread safe. I've revised the answer again to forego the use of the `ThreadPool` (I suppose there isn't any need for it, you're right) and use only `Monitor` locks.
Adam Robinson
+2  A: 

Well, from my point of view simpler code is better, so I'd only use one lock:

private readonly object thingeysLock = new object();
private readonly Dictionary<string, Thingey> thingeys;

public Thingey GetThingey(Request request)
{
    string key = request.ThingeyName;
    lock (thingeysLock)
    {
        Thingey ret;
        if (!thingeys.TryGetValue(key, out ret))
        {
            ret = new Thingey(request);
            thingeys[key] = ret;
        }
        return ret;
    }
}

Locks are really cheap when they're not contended. The downside is that this means that occasionally you will block everyone for the whole duration of the time you're creating a new Thingey. Clearly to avoid creating redundant thingeys you'd have to at least block while multiple threads create the Thingey for the same key. Reducing it so that they only block in that situation is somewhat harder.

I would suggest you use the above code but profile it to see whether it's fast enough. If you really need "only block when another thread is already creating the same thingey" then let us know and we'll see what we can do...

EDIT: You've commented on Adam's answer that you "don't want to lock while a new Thingey is being created" - you do realise that there's no getting away from that if there's contention for the same key, right? If thread 1 starts creating a Thingey, then thread 2 asks for the same key, your alternatives for thread 2 are either waiting or creating another instance.

EDIT: Okay, this is generally interesting, so here's a first pass at the "only block other threads asking for the same item".

private readonly object dictionaryLock = new object();
private readonly object creationLocksLock = new object();
private readonly Dictionary<string, Thingey> thingeys;
private readonly Dictionary<string, object> creationLocks;

public Thingey GetThingey(Request request)
{
    string key = request.ThingeyName;
    Thingey ret;
    bool entryExists;
    lock (dictionaryLock)
    {
       entryExists = thingeys.TryGetValue(key, out ret);
       // Atomically mark the dictionary to say we're creating this item,
       // and also set an entry for others to lock on
       if (!entryExists)
       {
           thingeys[key] = null;
           lock (creationLocksLock)
           {
               creationLocks[key] = new object();          
           }
       }
    }
    // If we found something, great!
    if (ret != null)
    {
        return ret;
    }
    // Otherwise, see if we're going to create it or whether we need to wait.
    if (entryExists)
    {
        object creationLock;
        lock (creationLocksLock)
        {
            creationLocks.TryGetValue(key, out creationLock);
        }
        // If creationLock is null, it means the creating thread has finished
        // creating it and removed the creation lock, so we don't need to wait.
        if (creationLock != null)
        {
            lock (creationLock)
            {
                Monitor.Wait(creationLock);
            }
        }
        // We *know* it's in the dictionary now - so just return it.
        lock (dictionaryLock)
        {
           return thingeys[key];
        }
    }
    else // We said we'd create it
    {
        Thingey thingey = new Thingey(request);
        // Put it in the dictionary
        lock (dictionaryLock)
        {
           thingeys[key] = thingey;
        }
        // Tell anyone waiting that they can look now
        lock (creationLocksLock)
        {
            Monitor.PulseAll(creationLocks[key]);
            creationLocks.Remove(key);
        }
        return thingey;
    }
}

Phew!

That's completely untested, and in particular it isn't in any way, shape or form robust in the face of exceptions in the creating thread... but I think it's the generally right idea :)

Jon Skeet
I don't want to hold the lock while a Thingey is created
No Refunds No Returns
Then you'll need one lock per key while they're being created...
Jon Skeet
...or create them if the key is not (yet) used, with the risk of creating multiple instances and only storing one. Since NRNR's code does this too, I suppose that this is acceptable in this specific case (still while keeping in mind that this race condition is present and may bite somehow).
Lucero
@Lucero: His request specifically says: "Is there a better way [...] that doesn’t involve check/create/lock/check/add with the rare extraneous thingey that we created but end up never using?" - so my guess is he wants to avoid it.
Jon Skeet
@Jon, yeah, but as we both know he cannot get both "for free"; either by locking on the outside or by creating possibly unused instanced. Because even the one-lock-per-key solution comes with the problem that you have to keep the list of locks somewhere and look them up, and currently I think that this basically puts you back to square one.
Lucero
@Jon If 2 threads check for the presence of the same key and do not find any what prevents them both from proceeding to create a copy of the thingy each?
mfeingold
@mfeingold (I hope Jon is okay with me answering): the check is done in a locked region, therefore only one creation lock will be created, and they then all wait on the creation lock if they didn't create it themselfes.
Lucero
Lucero is quite right about how it's prevented. But I disagree with his assertion that having the extra dictionary "puts you back to square one". We've already got one dictionary - having another one in the same scope (and one which will probably be much smaller, as it only contains entries for objects currently being created) costs hardly anything. If constructing a Thingey really is a slow operation, this could make a significant difference.
Jon Skeet
@Jon with my "square one" I meant that you have to lock on the other dictionary. Therefore, it does not avoid locking after all, and that's what I meant. Sorry if I was unclear.
Lucero
@Lucero: Look carefully: while I create a new instance of Thingey, I'm not holding *any* locks, which was the goal. Only other threads wanting the *same* new Thingey are blocked. Definitely not back to square one.
Jon Skeet
@Jon, we're not talking of the same thing. The idea behind the double check locking "pattern" is to avoid even entering and leaving a lock in the first time, because many (usually wrongfully) assume that entering and leaving a lock is always an expensive operation. So while you code does handle the creation locking very nicely, there is still some locking needed (for a short time) to synchronize the dictionaries, which isn't anything like the double check locking "pattern" anymore. That said, I know that many people are not aware of the hidden issues with this "pattern", and why not to use it.
Lucero
+5  A: 

This is the standard double check locking problem. The way it is implemented here is unsafe and can cause various problems - potentially up to the point of a crash in the first check if the internal state of the dictionary is screwed up bad enough.

It is unsafe because you are checking it without synchronization and if your luck is bad enough you can hit it while some other thread is in the middle of updating internal state of the dictionary

A simple solution is to place the first check under a lock as well. A problem with this is that this becomes a global lock and in web environment under heavy load it can become a serious bottleneck.

If we are talking about .NET environment, there are ways to work around this issue by piggybacking on the ASP.NET synchronization mechanism.

Here is how I did it in NDjango rendering engine: I keep one global dictionary and one dictionary per rendering thread. When a request comes I check the local dictionary first - this check does not have to be synchronized and if the thingy is there I just take it

If it is not I synchronize on the global dictionary check if it is there and if it is add it to my thread dictionary and release the lock. If it is not in the global dictionary I add it there first while still under lock.

mfeingold
How is this implementation unsafe. Can you elaborate?
No Refunds No Returns
The call to .ContainsKey is occurring outside of the lock. If another thread is inside the lock partway through the call to .Add, the dictionary may be in an invalid intermediate state (for example if the hash buckets are being resized). This may cause an object that is in the dictionary to appear to be missing, an outright crash or other undefined behavior. The lock only protectes you if you if every access occurs inside the lock.
StarPacker
+1  A: 

IMHO, if this piece of code is called from many thread simultaneous, it is recommended to check it twice.

(But: I'm not sure that you can safely call ContainsKey while some other thread is call Add. So it might not be possible to avoid the lock at all.)

If you just want to avoid the Thingy is created but not used, just create it within the locking block:

private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
    string thingeyName = request.ThingeyName;
    if (!this.Thingeys.ContainsKey(thingeyName))
    {
        lock (this.Thingeys)
        {
            // only one can create the same Thingy
            Thingey newThingey = new Thingey(request);
            if (!this.Thingeys.ContainsKey(thingeyName))
            {
                this.Thingeys.Add(thingeyName, newThingey);
            }

        }
    }

    return this. Thingeys[thingeyName];
}
Stefan Steinegger
Thingey's will only ever be added once so once if I get a value, I'll never try adding it. The "add" call is inside the lock.
No Refunds No Returns
+1 for seeing the issue with thread safety of the code outside the lock.@NRNR: Yes, but since the other thread does not lock to do a `ContainsKey` and to get the value, it will not stop doing this wile the lock owning thread does a `Add`, therefore those will be done simultaneously and this may be a problem depending on th exact implementation and depending on the architecture and processor caches (the fields in the non-threadsafe implementations are not makred as being volatile).
Lucero
A: 

You have to ask yourself the question whether the specific ContainsKey operation and the getter are themselfes threadsafe (and will stay that way in newer versions), because those may and willbe invokes while another thread has the dictionary locked and is performing the Add.

Typically, .NET locks are fairly efficient if used correctly, and I believe that in this situation you're better of doing this:

bool exists;
lock (thingeys) {
    exists = thingeys.TryGetValue(thingeyName, out thingey);
}
if (!exists) {
    thingey = new Thingey();
}
lock (thingeys) {
    if (!thingeys.ContainsKey(thingeyName)) {
        thingeys.Add(thingeyName, thingey);
    }
}
return thingey;
Lucero
How can this be more efficient then double-check-locking if you apply lock every time?
The lock is cheap when there is no contention. Here, the locked code portions are very short and quick (object creation is not in the lock), so that contention will be minimized.
Lucero
Locks are cheap on cheap boxes. The beefier the box the more damage a lock can do. The problem is not the lock itself the problem is memory synchronization. If you have an 8 core box to implement the lock all of them have to stop doing what they do and synchronize their on dye memory cache with the main memory
mfeingold
@mfeingold: why should they sync their on-die memory? They just have to make sure that they access the lock variable as volatile, so that if the're doing anything else this doesn't concern them. Or am I completely wrong with that? Anyone?
Lucero
@Lucero accessing volatile memory requires memory barrier AKA memory synchronization
mfeingold
@mfeingold well it's not so easy. Depending on architecture the memory barrier semantics differ. You may have full fence, acquire/release etc., so that this depends very much on the underlying architecture. However, all synchronized operations (and you absolutely need those also to implement a working double-ckeck-locking, to be on topic again) require some sort of memory barriers to work correctly. For this specific code, a pattern like the one from Jon mixed with a spin lock instead of normal locking may be pretty performant. Still, you'd need to somehow profile the code before optimizing...
Lucero
A: 

You might be able to buy a little bit of speed efficiency at the expense of memory. If you create an immutable array that lists all of the created Thingys and reference the array with a static variable, then you could check the existance of a Thingy outside of any lock, since immutable arrays are always thread safe. Then when adding a new Thingy, you can create a new array with the additional Thingy and replace it (in the static variable) in one (atomic) set operation. Some new Thingys may be missed, because of race conditions, but the program shouldn't fail. It just means that on rare occasions extra duplicate Thingys will be made.

This will not replace the need for duplicate checking when creating a new Thingy, and it will use a lot of memory resources, but it will not require that the lock be taken or held while creating a Thingy.

I'm thinking of something along these lines, sorta:

private Dictionary<string, Thingey> Thingeys;
// An immutable list of (most of) the thingeys that have been created.
private string[] existingThingeys;

public Thingey GetThingey(Request request)
{
    string thingeyName = request.ThingeyName;
    // Reference the same list throughout the method, just in case another
    // thread replaces the global reference between operations.
    string[] localThingyList = existingThingeys;
    // Check to see if we already made this Thingey. (This might miss some, 
    // but it doesn't matter.
    // This operation on an immutable array is thread-safe.
    if (localThingyList.Contains(thingeyName))
    {
        // But referencing the dictionary is not thread-safe.
        lock (this.Thingeys)
        {
            if (this.Thingeys.ContainsKey(thingeyName))
                return this.Thingeys[thingeyName];
        }
    }
    Thingey newThingey = new Thingey(request);
    Thiney ret;
    // We haven't locked anything at this point, but we have created a new 
    // Thingey that we probably needed.
    lock (this.Thingeys)
    {
        // If it turns out that the Thingey was already there, then 
        // return the old one.
        if (!Thingeys.TryGetValue(thingeyName, out ret))
        {
            // Otherwise, add the new one.
            Thingeys.Add(thingeyName, newThingey);
            ret = newThingey;
        }
    }
    // Update our existingThingeys array atomically.
    string[] newThingyList = new string[localThingyList.Length + 1];
    Array.Copy(localThingyList, newThingey, localThingyList.Length);
    newThingey[localThingyList.Length] = thingeyName;
    existingThingeys = newThingyList; // Voila!
    return ret;
}
Jeffrey L Whitledge
A: 

Well I hope not being to naive at giving this answer. but what I would do, as Thingyes are expensive to create, would be to add the key with a null value. That is something like this

private Dictionary<string, Thingey> Thingeys;
public Thingey GetThingey(Request request)
{
    string thingeyName = request.ThingeyName;
    if (!this.Thingeys.ContainsKey(thingeyName))
    {
        lock (this.Thingeys)
        {
            this.Thingeys.Add(thingeyName, null);
            if (!this.Thingeys.ContainsKey(thingeyName))
            {
                // create a new thingey on 1st reference
                Thingey newThingey = new Thingey(request);
                Thingeys[thingeyName] = newThingey;
            }
            // else - oops someone else beat us to it
            // but it doesn't mather anymore since we only created one Thingey
        }
    }

    return this.Thingeys[thingeyName];
}

I modified your code in a rush so no testing was done.
Anyway, I hope my idea is not so naive. :D

Limo Wan Kenobi