views:

122

answers:

4

I am confused by a code listing in a book i am reading, C# 3 in a Nutshell, on threading. In the topic on Thread Safety in Application Servers, below code is given as an example of a UserCache:

static class UserCache
{
    static Dictionary< int,User> _users = new Dictionary< int, User>();

    internal static User GetUser(int id)
    {
        User u = null;

        lock (_users) // Why lock this???
            if (_users.TryGetValue(id, out u))
                return u;

        u = RetrieveUser(id); //Method to retrieve from databse

        lock (_users) _users[id] = u; //Why lock this???
            return u;
    }
}

The authors explain why the RetrieveUser method is not in a lock, this is to avoid locking the cache for a longer period.
I am confused as to why lock the TryGetValue and the update of the dictionary since even with the above the dictionary is being updated twice if 2 threads call simultaneously with the same unretrieved id.

What is being achieved by locking the dictionary read?
Many thanks in advance for all your comments and insights.

+8  A: 

The Dictionary<TKey, TValue> class is not threadsafe.

If one thread writes one key to the dictionary while a different thread reads the dictionary, it may get messed up. (For example, if the write operation triggers an array resize, or if the two keys are a hash collision)

Therefore, the code uses a lock to prevent concurrent writes.

SLaks
Thanks! That makes sense. I guess there is really no way of actually proving it deterministically. I tried removing the locks and called it in a loop for a few thousand times by creating new threads, after changing the code so that each thread will always write after the read, but it worked fine. But as with most threading the fact it worked for me does not mean it will always work. Cheers!
+1  A: 

That's a common practice to access any non thread safe structures like lists, dictionaries, common shared values, etc.

And answering main question: locking a read we guarantee that dictionary will not be changed by another thread while we are reading its value. This is not implemented in dictionary and that is why it’s called non thread safe :)

terR0Q
+2  A: 

There is a benign race condition when writing to the dictionary; it is possible, as you stated, for two threads to determine there is not a matching entry in the cache. In this case, both of them will read from the DB and then attempt to insert. Only the object inserted by the last thread is kept; the other object will be garbage collected when the first thread is done with it.

The read to the dictionary needs to be locked because another thread may be writing at the same time, and the read needs to search over a consistent structure.

Note that the ConcurrentDictionary introduced in .NET 4.0 pretty much replaces this kind of idiom.

Stephen Cleary
A: 

If two threads call in simultaneously and the id exists, then they will both return the correct User information. The first lock is to prevent errors like SLaks said - if someone is writing to the dictionary while you are trying to read it, you'll have issues. In this scenario, the second lock will never be reached.

If two threads call in simultaneously and the id does not exist, one thread will lock and enter TryGetValue, this will return false and set u to a default value. This first lock is again, to prevent the errors described by SLaks. At this point, that first thread will release the lock and the second thread will enter and do the same. Both will then set 'u' to information from 'RetrieveUser(id)'; this should be the same information. One thread will then lock the dictionary and assign _users[id] to the value of u. This second lock is so that two threads are trying to write values to the same memory locations simultaneously and corrupting that memory. I don't know what the second thread will do when it enters the assignment. It will either return early ignoring the update, or overwrite the existing data from the first thread. Regardless, the Dictionary will contain the same information because both threads should have recieved the same data in 'u' from RetrieveUser.

For performance, the auther compared two scenarios - the above scenario, which will be extremely rare and block while two threads try and write the same data, and second one where it is far more likely that two threads call in requesting data for an object that needs written, and one that exists. For example, threadA and threadB call in simultaneously and ThreadA locks for an id that doesn't exist. There is no reason to make threadB wait for a lookup while threadA is working on RetriveUser. This situation is probably far more likely than the duplicate ids described above, so for performance the author chose not to lock on the whole block.

Jess