views:

803

answers:

6

I recently asked a question about functional programming, and received (good!) answers that prompted more questions (as seems to be the case with learning, sometimes). Here are a couple examples:

  1. One answer made reference to an advantage of immutable data structures: each thread can have its own copy. Now, to me, this sounds rather like a version control system (to use an analogy), where instead of locking code that someone has checked out so that it can't be modified by anyone else, everyone can check out their own copies. Sounds good. However, in VCS you have the concept of "merging" changes, in the case that two people changed the same stuff. It seems like this issue could certainly come up in a multithreaded scenario... so how is "merging" done when it's important that threads see the most recent data?

  2. This answer talked about the case where operations were being performed in a loop on an object, and how you can use a new object each time through instead of updating an old one. However, let's say the bankAccount is being updated in a non-loop scenario--for example a GUI banking system. The operator clicks the "Change Interest Rate" button, which fires an event that would (in C# for example) do something like bankAccount.InterestRate = newRateFromUser. I feel like I'm being dense here, but hopefully my example makes sense: there has to be some way that the object is updated, right? Several other things may depend on the the new data.

Anyway, if you can help me get my head around the paradigm shift, I'd be appreciative. I remember my brain going through similar "stupid phases" when learning OOP after a background of the simple procedural imperative approach to coding.

+3  A: 
  1. Immutable data structures are not like VCS. Think of a Immutable data structure as a read-only file. If its read only, it doesn't matter who is reading which part of the file at any given time, everybody will read the correct information.

  2. That answer is talking about http://en.wikipedia.org/wiki/Monad_(functional_programming)

Pyrolistical
But let's say the data gets updated in a concurrent scenario; e.g. it's like the read-only file can be overwritten by a new read-only file. Then don't you still have the problem of someone maybe reading an older file? (This is what I don't get)
J Cooper
That's part of the concept, if you can't update it, nothing is ever out of date. Immutable means you can't change it.
Pyrolistical
But what if it needs updating? :( Clearly I'm too stupid to reason about this, I'll have to find some code of it happening in action. Thanks for answering
J Cooper
+4  A: 

Think about the String class in .Net (which is an immutable object). If you call a method on a string, you get a new copy:

String s1 = "there";
String s2 = s1.Insert(0, "hello ");

Console.Writeline("string 1: " + s1);
Console.Writeline("string 2: " + s2);

This will output:

string 1: there

string 2: hello there

Compare this behaviour to StringBuilder, which has basically the same method signature:

StringBuilder sb  = new StringBuilder("there");
StringBuilder sb2 = sb.Insert(0, "hi ");

Console.WriteLine("sb 1: " + sb.ToString());
Console.WriteLine("sb 2: " + sb2.ToString());

Because StringBuilder is mutable, both variables point to the same object. The output will be:

sb 1: hi there

sb 2: hi there

So, you absolutely cannot change a string once you've created it. s1 will always be "there" until the end of time (or until its garbage collected). That's important in threading because you can always step through each character and print its value knowing that it will always print 'there'. If you started printing the StringBuilder after it was created, you might print the first two characters of there and get'th'. Now, imagine another thread comes along ad inserts 'hi '. The value is now different! When you print the third character, it's the space from 'hi '. So you print: 'th there'.

Travis
A: 

"Immutable" means exactly that: it doesn't change.

The way functional programs do updates is by passing around new things. An existing value never changes: you just build a new value and pass that instead. Very often the new value shares state with the old one; good examples of the technique are lists composed of cons cells, and the zipper.

Rich
+3  A: 

Answer to part 1: Immutable objects don't by themselves support anything like "merging" to allow the results of two thread's updates to be combined. There are two major strategies for that: the pessimistic and the optimistic. If you're pessimistic, you assume it's quite likely that two threads will want to update the same piece of data at the same time. So you employ locking, such that the second thread will freeze up until the first thread says it has finished. If you're optimistic that this will happen only rarely, you let both threads work on their own logical copy of the data. The one that finishes first supplies the new version, and the other has to start again from the beginning - only now it starts from the results of the first thread's changes. This expensive restarting only happens occasionally though, so over all it performs better due to the lack of locking (although this is only true if your optimism was well placed about how rarely a clash occurs).

Part 2: Pure functional stateless languages do not really eliminate that problem. Even a pure Haskell program can have state associated with it. The difference is that stateful code has a different return type. A function that manipulates state is expressed as a sequence of operations that operate on an object representing that state. In an absurd example, consider the computer's file system. Every time a program modifies the contents of a file (even by a single byte) it has created a new "version" of the entire file system. And by extension, a new version of the entire universe. But let's focus on the file system for now. Any other part of the program that examines the file system may now be affected by that modified byte. So Haskell says that functions operating on the file system must effectively pass around an object representing a version of the file system. Then because this would be tedious to manually deal with, it turns the requirement inside out and says that if a function wants to be able to do IO, it must return a sort of container object. Inside the container is the value the function wants to return. But the container serves as evidence that the function also either has side-effects or can see side-effects. It means that Haskell's type system is able to distinguish between functions-with-side-effects and "pure" functions. So it helps to contain and manage the statefulness of code, without really eliminating it.

Daniel Earwicker
+2  A: 

With regards to #2...

Several other things may depend on the the new data.

This is what the purists call an "effect". The notion of multiple object references to the same mutable object is the essence of mutable state and the crux of the issue. In OOP, you may have an object "a" of type BankAccount, and if you read a.Balance or whatnot at different times you may see different values. In contrast, in pure FP, if "a" has type BankAccount, then it's immutable and has the same value regardless of time.

Since however a BankAccount is presumably an object we want to model whose state does vary with time, we would in FP encode that information in the type. So "a" might have type "IO BankAccount", or some other monadic type which essentially boils down to making "a" actually be a function that takes as input "the previous state of the world" (or previous state of bank interest rates, or whatever), and returns a new state of the world. Updating the interest rate would be another operation with a type that represents the effect (e.g. another IO operation), and thus would return a new 'world', and everything that may depend on the interest rate (world state) would be data with a type that knows it needs to take that world as input.

As a result, the only possible way to call "a.Balance" or whatnot is to use code that, thanks to the static types, enforces that some 'world history that got us up until now' has been properly plumbed to the point of the call, and whatever world history is the input affects what result we'll get from a.Balance.

Reading up on the State Monad may be useful to get a sense of how you model 'shared mutable state' purely.

Brian
+2  A: 

MVCC (MultiVersion Concurrency Control)

The solution for the problem you are referring to is described by Rich Hickey in his video presentations.

In short: instead of passing the data by reference directly to clients, you add one more level on indirection and pass the reference to the reference to the data. (Well, actually you would like to have at least one more level of indirection. But let's assume that the data structure is very simple, like the "array".)
Since the data is immutable, each time the data should be changed, you create the copy of the changed part (in case of the array you should create another array!) plus you create another reference to all the "changed" data.
So for all clients that used the 1st version of the array, they use the reference to the first version. Every client trying to access the 2nd version uses the second reference.
The "array" data structure is not very interesting for this method, because you cannot split the data and you are forced to copy everything. But for more sophisticated data structures like trees, some parts of data structure can be "shared", so you are not forced to copy everything each time.

For details please have a look at this paper: "Purely Functional Data Structures" by Chris Okasaki.

avp