views:

787

answers:

6

I was recently made aware that thread local storage is limited on some platforms. For example, the docs for the C++ library boost::thread read:

"Note: There is an implementation specific limit to the number of thread specific storage objects that can be created, and this limit may be small."

I've been searching to try and find out the limits for different platforms, but I haven't been able to find an authoritative table. This is an important question if you're writing a crossplatform app that uses TLS. Linux was the only platform I found information for, in the form of a patch Ingo Monar sent in 2002 to the kernel list adding TLS support, where he mentions, "The number of TLS areas is unlimited, and there is no additional allocation overhead associated with TLS support." Which if still true in 2009 (is it?) is pretty nifty.

But what about Linux today? OS X? Windows? Solaris? Embedded OSes? For OS's that run on multiple architectures does it vary across architectures?

Edit: If you're curious why there might be a limit, consider that the space for thread local storage will be preallocated, so you'll be paying a cost for it on every single thread. Even a small amount in the face of lots of threads can be a problem.

A: 

It may be that the boost documentation is simply talking about a general configurable limit, not necessarily some hard limit of the platform. On Linux, the ulimit command limits resources processes can have (number of threads, stack size, memory, and a bunch of other stuff). This will indirectly impact your thread local storage. On my system, there doesn't seem to be an entry in ulimit specific to thread local storage. Other platforms may have a way to specify that on its own. Also, I think in many multiprocessor systems, the thread local storage will be in memory dedicated to that CPU, so you may run into limits of physical memory long before the system as a whole has its memory exhausted. I would assume there is some kind of fallback behavior to locate the data in main memory in that situation, but I don't know. As you can tell, I'm conjecturing a lot. Hopefully it still leads you in the right direction...

rmeador
+2  A: 

I have only used TLS on Windows, and there are slight differences between versions in how much can be used: http://msdn.microsoft.com/en-us/library/ms686749%28VS.85%29.aspx

I assume that your code is only targeting operating systems that support threads - in the past I have worked with embedded and desktop OSes that do not support threading, so do not support TLS.

+5  A: 

On Linux, if you are using __thread TLS data, the only limit is set by your available address space, as this data is simply allocated as regular RAM referenced by the gs (on x86) or fs (on x86-64) segment descriptors. Note that, in some cases, allocation of TLS data used by dynamically loaded libraries can be elided in threads that do not use that TLS data.

TLS allocated by pthread_key_create and friends, however, is limited to PTHREAD_KEYS_MAX slots (this applies to all conforming pthreads implementations).

For more information on the TLS implemenetation on Linux, see ELF Handling For Thread-Local Storage and The Native POSIX Thread Library for Linux.

That said, if you need portability, your best bet is to minimize TLS use - put a single pointer in TLS, and put everything you need in a data structure hung off that pointer.

bdonlan
A: 

The thread-local storage declspec on Windows limits you to using it only for static variables, which means you are out of luck if you want to use it in more creative ways.

There is a low-level API on Windows, but it has broken semantics that make it very awkward to initialise: you can't tell whether or not the variable has already been seen by your thread, so you need to explicitly initialise it when you create the thread.

On the other hand, the pthread API for thread-local storage is well thought-out and flexible.

alex tingle
A: 

I use a simple template class to provide thread local storage. This simply wraps a std::map and a critical section. This then doesn't suffer from any platform specific thread local problems, the only platform requirement is to get the current thread id as in integer. It might be a little slower than native thread local storage but it can store any data type.

Below is a cut down version of my code. I have removed the the default value logic to simplify the code. As it can store any data type, the increment and decrement operators are only available if T supports them. The critical section is only required to protect looking up and inserting into the map. Once a reference is returned it is safe to use unprotected as only the current thread will use this value.

template <class T>
class ThreadLocal
{
public:
    operator T()
    {
     return value();
    }

    T & operator++()
    {
     return ++value();
    }

    T operator++(int)
    {
     return value()++;
    }

    T & operator--()
    {
     return --value();
    }

    T operator--(int)
    {
     return value()--;
    }

    T & operator=(const T& v)
    {
     return (value() = v);
    }

private:
    T & value()
    {
     LockGuard<CriticalSection> lock(m_cs);
     return m_threadMap[Thread::getThreadID()];
    }

    CriticalSection  m_cs;
    std::map<int, T> m_threadMap;
};

To use this class I generally declare a static member inside a class eg

class DBConnection {
    DBConnection() {
        ++m_connectionCount;
    }

    ~DBConnection() {
        --m_connectionCount;
    }

    // ...
    static ThreadLocal<unsigned int> m_connectionCount;
};

ThreadLocal<unsigned int> DBConnection::m_connectionCount

It might not be perfect for every situation but it covers my need and I can easily add any features it is missing as I discover them.

bdonlan is correct this example doesn't clean up after threads exit. However this is very easy to add manually clean up.

template <class T>
class ThreadLocal
{
public:
    static void cleanup(ThreadLocal<T> & tl)
    {
     LockGuard<CriticalSection> lock(m_cs);
     tl.m_threadMap.erase(Thread::getThreadID());
    }

    class AutoCleanup {
    public:
        AutoCleanup(ThreadLocal<T> & tl) : m_tl(tl) {}
        ~AutoCleanup() {
            cleanup(m_tl);
        }

    private:
        ThreadLocal<T> m_tl
    }

    // ...
}

Then a thread that knows it makes explicit use of the ThreadLocal can use ThreadLocal::AutoCleanup in its main function to clean up the variable.

Or in the case of DBConnection

~DBConnection() {
    if (--m_connectionCount == 0)
        ThreadLocal<int>::cleanup(m_connectionCount);
}

The cleanup() method is static so as not to interfere with operator T(). A global function can be used to call this which would infer the Template parameters.

iain
That doesn't clean up after a thread dies...
bdonlan
You are correct, currently clean up would have to be done manually
iain
+1  A: 

On the Mac, I know of Task-Specific Storage in the Multiprocessing Services API:

MPAllocateTaskStorageIndex
MPDeallocateTaskStorageIndex
MPGetTaskStorageValue
MPSetTaskStorageValue

This looks very similar to Windows thread local storage.

I'm not sure if this API is currently recommended for thread local storage on the Mac. Perhaps there is something newer.

jwfearn