views:

709

answers:

5

This question led me to wonder about thread-local storage in high-level development frameworks like Java and .NET.

Java has a ThreadLocal<T> class (and perhaps other constructs), while .NET has data slots, and soon a ThreadLocal<T> class of its own. (It also has the ThreadStaticAttribute, but I'm particularly interested in thread-local storage for member data.) Most other modern development environments provide one or more mechanisms for it, either at the language or framework level.

What problems does thread-local storage solve, or what advantages does thread-local storage provide over the standard object-oriented idiom of creating separate object instances to contain thread-local data? In other words, how is this:

// Thread local storage approach - start 200 threads using the same object
// Each thread creates a copy of any thread-local data
ThreadLocalInstance instance = new ThreadLocalInstance();
for(int i=0; i < 200; i++) {
    ThreadStart threadStart = new ThreadStart(instance.DoSomething);
    new Thread(threadStart).Start();
}

Superior to this?

// Normal oo approach, create 200 objects, start a new thread on each
for(int i=0; i < 200; i++) {
    StandardInstance standardInstance = new StandardInstance();
    ThreadStart threadStart = new ThreadStart(standardInstance.DoSomething);      
    new Thread(threadStart).Start();
}

I can see that using a single object with thread-local storage could be slightly more memory-efficient and require fewer processor resources due to fewer allocations (and constructions). Are there other advantages?

+4  A: 

Just occasionally, it's helpful to have thread-local state. One example is for a log context - it can be useful to set the context of which request you're currently servicing, or something similar, so that you can collate all the logs to do with that request.

Another good example is System.Random in .NET. It's fairly common knowledge that you shouldn't create a new instance every time you want to use Random, so some people create a single instance and put it in a static variable... but that's awkward because Random isn't thread-safe. Instead, you really want one instance per thread, seeded appropriately. ThreadLocal<T> works great for this.

Similar examples are the culture associated with a thread, or the security context.

In general, it's a case of not wanting to pass too much context round all over the place. You could make every single method call include a "RandomContext" or a "LogContext" - but it would get in the way of your API's cleanliness - and the chain would be broken if you ever had to call into another API which would call back to yours through a virtual method or something similar.

In my view, thread-local data is something that should be avoided where possible - but just occasionally it can be really useful.

I would say that in most cases you can get away with it being static - but just occasionally you might want per-instance, per-thread information. Again, it's worth using your judgement to see where it's useful.

Jon Skeet
Can you give an example of where a property must have an independent value per-instance *and* per-thread?
finnw
+1  A: 

In Java, Thread local storage can be useful in a web application where a single request is typically processed by a given Thread. Take Spring Security for instance, the security Filter will perform the authentication and then store the users credentials in a Thread local variable.

This allows the actual request processing code to have access to the current users request/authentication information without having to inject anything else in to the code.

Kevin
I guess I will stop typing my answer now ;-)
Robin
Should point out that the thread locals are global variables in which the associated value is specific to the thread of execution. Thus they do not have to be passed via method arguments to be accessed.
Robin
@Robin: Many implementations, such as .NET's ThreadLocal<T>, hide the global, static nature of thread locals from the developer, and make them act like instance variables.
Reed Copsey
A: 

It helps passing a value down the stack. It comes handy when you need a value down the call stack but there is no way (or benefit) to pass this value to the place it is needed as a parameter to a method. The above example of storing the current HttpRequest in a ThreaLocal is a good example of this: the alternative would be to pass the HttpRequest as parameter down the stack to where it would be needed.

julius
What you are describing are global variables, not thread locals.
Michael Borgwardt
@Michael, You are wrong there. Implicit parameters often need to be local to a thread. I have used this on a few occasions. You can not have a single, global "current HttpRequest" variable in a multithreaded server, because requests in different threads would interfere with each other, so this answer is a good example.
finnw
@finnw: No, I am right. The answer describes the "benefits" of global variables (and deserves a downvote for that alone) but explains nothing explicitly about multithreading and how ThreadLocal would be necessary.
Michael Borgwardt
What is meant by a "current HttpRequest" is a request being processed by any thread of a multi-threaded server.Of course this only holds when your server runtime guarantees that a request is processed in one thread only. If the server processes a single requests in different threads, the request is only "current" in the thread where "ThreadLocal.set" was called.
julius
+1  A: 

What problems does thread-local storage solve, or what advantages does thread-local storage provide over the standard object-oriented idiom of creating separate object instances to contain thread-local data?

Thread local storage allows you to provide each running thread with a unique instance of a class, which is very valuable when trying to work with non-threadsafe classes, or when trying to avoid synchronization requirements that can occur due to shared state.

As for the advantage vs. your example - if you are spawning a single thread, there is little or no advantage to using thread local storage over passing in an instance. ThreadLocal<T> and similar constructs become incredibly valuable, however, when working (directly or indirectly) with a ThreadPool.

For example, I have a specific process I worked on recently, where we are doing some very heavy computation using the new Task Parallel Library in .NET. Certain portions of the computations performed can be cached, and if the cache contains a specific match, we can shave off quite a bit of time when processing one element. However, the cached info had a high memory requirement, so we didn't want to cache more than the last processing step.

However, trying to share this cache across threads is problematic. In order to do so, we'd have to synchronize the access to it, and also add some extra checks inside of our class to make them thread safe.

Instead of doing this, I rewrote the algorithm to allow each thread to maintain its own private cache in a ThreadLocal<T>. This allows the threads to each maintain their own, private cache. Since the partitioning scheme the TPL uses tends to keep blocks of elements together, each thread's local cache tended to contain the appropriate values it required.

This eliminated the synchronization issues, but also allowed us to keep our caching in place. The overall benefit was quite large, in this situation.

For a more concrete example, take a look at this blog post I wrote on aggregation using the TPL. Internally, the Parallel class uses a ThreadLocal<TLocal> whenever you use the ForEach overload that keeps local state (and the Parallel.For<TLocal> methods, too). This is how the local state is kept separate per thread to avoid locking.

Reed Copsey
I *think* I'm getting this now (after reading some more of your series on Parallelism in .NET 4.0). Would it be fair to say that instance-level thread-local storage facilitates the abstractions used by the Task Parallel Library, where delegates are the primary semantic unit rather than objects? (Ignoring for the moment the fact that delegates are themselves objects.)
Jeff Sternal
Thread local storage provides useful alternatives to avoid locking in cases when things aren't necessarily thread safe, by providing each unique thread a unique copy. That's where I've used it the most. The TPL usage pattern makes this easy to see where it's useful, but it's useful even in other cases. Typically, if a thread will be used for more than one item, there ~may~ be an opportunity to use thread local storage. (When a thread is processing only one item, you can just use a local directly...)
Reed Copsey