views:

393

answers:

7

Threads each have their own stack, but they share a common heap.

Its clear to everyone that stack is for local/method variables & heap is for instance/class variables.

What is the benefit of sharing heap among threads.

There are several number of threads running simultaneously, so sharing memory can lead to issues such as concurrent modification, mutual exclusion etc overhead. What contents are shared by threads in heap.

Why is this the case? Why not have each thread own its own heap as well? Can anyone provide a real world example of this, how shared memory is utilized by threads?

+3  A: 

Processes don't --generally-- share heap space. There are API's to permit this, but the default is that processes are separate

Threads share heap space.

That's the "practical idea" -- two ways to use memory -- shared and not shared.

S.Lott
processes can share heap space - shared memory APIs provide that. Oh, and Windows 3.1 -> Windows Me shared heaps :)
gbjbaanb
Requires special API's to accomplish -- not the default.
S.Lott
On Linux you can share whatever you like using `clone()`.
Matt Joiner
+1  A: 

In many languages/runtimes the stack is (among other) used for keep function/method parameters and variables. If thread shared a stack, things would get really messy.

void MyFunc(int a) // Stored on the stack
{
   int b; // Stored on the stack
}

When the call to 'MyFunc' is done, the stacked is popped and a and b is no longer on the stack. Because threads dont share stacks, there is no threading issue for the variables a and b.

Because of the nature of the stack (pushing/popping) its not really suited for keeping 'long term' state or shared state across function calls. Like this:

int globalValue; // stored on the heap

void Foo() 
{
   int b = globalValue; // Gets the current value of globalValue

   globalValue = 10;
}

void Bar() // Stored on the stack
{
   int b = globalValue; // Gets the current value of globalValue

   globalValue = 20;
}


void main()
{
   globalValue = 0;
   Foo();
   // globalValue is now 10
   Bar();
   // globalValue is now 20
}
Martin Ingvar Kofoed Jensen
A: 

That's because the idea of threads is "share everything". Of course, there are some things you cannot share, like processor context and stack, but everything else is shared.

ninjalj
+11  A: 

What do you do when you want to pass data from one thread to another? (If you never did that you'd be writing separate programs, not one multi-threaded program.) There are two major approaches:

  • The approach you seem to take for granted is shared memory: except for data that has a compelling reason to be thread-specific (such as the stack), all data is accessible to all threads. Basically, there is a shared heap. That gives you speed: any time a thread changes some data, other threads can see it. (Limitation: this is not true if the threads are executing on different processors: there the programmer needs to work especially hard to use shared memory correctly and efficiently.) Most major imperative languages, in particular Java and C#, favor this model.

    It is possible to have one heap per thread, plus a shared heap. This requires the programmer to decide which data to put where, and that often doesn't mesh well with existing programming languages.

  • The dual approach is message passing: each thread has its own data space; when a thread wants to communicate with another thread it needs to explicitly send a message to the other thread, so as to copy the data from the sender's heap to the recipient's heap. In this setting many communities prefer to call the threads processes. That gives you safety: since a thread can't overwrite some other thread's memory on a whim, a lot of bugs are avoided. Another benefit is distribution: you can make your threads run on separate machines without having to change a single line in your program. You can find message passing libraries for most languages but integration tends to be less good. Good languages to understand message passing in are Erlang and JoCaml.

    In fact message passing environments usually use shared memory behind the scene, at least as long as the threads are running on the same machine/processor. This saves a lot of time and memory since passing a message from one thread to another then doesn't require making a copy of the data. But since the shared memory is not exposed to the programmer, its inherent complexity is confined to the language/library implementation.

Gilles
Excellent answer. In fact, some older operating systems treated all programs in the system essentially as threads in one big system process (I think System/360 did this?). The philosophical difference between shared memory and message passing is at the heart of the design differences between Windows and Unix even today.
Daniel Pryden
@Daniel: many embedded systems still do, because enforcing process separation is expensive when you count your memory in kB, and it requires hardware support (typically via a MMU). I don't understand where Windows and Unix differ in their treatment of concurrency, could you elaborate a little?
Gilles
@Gilles: What I mean is that the Windows platform favors shared memory solutions, with OS-level support for threading. On the other hand, Unix has traditionally preferred communication through pipes and sockets over shared memory solutions. It's by no means a hard and fast distinction, since both solutions are available on both platforms, but each has its "preferred" way, and that leads to the "philosophical difference" I described in my comment.
Daniel Pryden
+1  A: 

The Heap is just all memory outside of the stack that is dynamically allocated. Since the OS provides a single address space then it becomes clear that the heap is by definition shared by all threads in the process. As for why stacks are not shared, that's because an execution thread has to have its own stack to be able to manage its call tree (it contains information about what to do when you leave a function, for instance!).

Now you could of course write a memory manager that allocated data from different areas in your address space depending on the calling thread, but other threads would still be able to see that data (just like if you somehow leak a pointer to something on your thread's stack to another thread, that other thread could read it, despite this being a horrible idea)

Yuliy
To be pedantic, many memory managers _do indeed_ allocate memory from different areas (arenas), but they do so to improve performance. Of course, the resulting memory is still shared.
ninjalj
+1  A: 

Because otherwise they would be processes. That is the whole idea of threads, to share memory.

EJP
A: 

The problem is that having local heaps adds significant complexity for very little value.

There is a small performance advantage and this is handled well by the TLAB (Thread Local Allocation Buffer) which gives you most of the advantage transparently.

Peter Lawrey