views:

188

answers:

5

Possible Duplicate:
Functional programming and multicore architecture

I hear a lot about functional languages, and how they scale well because there is no state around a function; and therefore that function can be massively parallelized.

However, this makes little sense to me because almost all real-world practical programs need/have state to take care of. I also find it interesting that most major scaling libraries, i.e. MapReduce, are typically written in imperative languages like C or C++.

I'd like to hear from the functional camp where this hype I'm hearing is coming from....

+4  A: 

Functional languages such as Haskell, Scheme and others have what are called "pure functions". A pure function is a function with no side effects. It doesn't modify any other state in the program. This is by definition threadsafe.

Of course you can write pure functions in imperative languages. You also find multi-paradigm languages like Python, Ruby and even C# where you can do imperative programming, functional programming or both.

But the point of Haskell (etc) is that you can't write a non-pure function. Well that's not strictly true but it's mostly true.

Similarly, many imperative languages have immutable objects for much the same reason. An immutable object is one whose state doesn't change once created. Again by definition an immutable object is threadsafe.

cletus
Yes, but you could do that with imperative languages too ... hence my confusion. It's not like the static analysis for such a thing is difficult.
Billy ONeal
+3  A: 

You're talking about two different things and don't realize it.

Yes, most real-world programs have state somewhere, but if you want to do multithreading, that state should not be everywhere, and in fact, the fewer places it's in, the better. In functional programs, the default is not to have state, and you can introduce state exactly where you need it and nowhere else. Those parts that are dealing with state will not be as easily multithreaded, but since all the rest of your program is free of side-effects and thus it doesn't matter what order those parts are executed in, it removes a huge barrier to parallelization.

Chuck
So you're saying it's more about programming style than about the language itself?
Billy ONeal
@Billy ONeal--it's a lot more than just programming style. FP flips the programming model over. In imperative, most everything is mutable by default and you have to make extra effort to avoid mutation. In FP, most everything is immutable by default and you have to make an effort to make something mutable. It's a 180 degree change in thinking and approach.
Onorio Catenacci
@Billy ONeal: To a large degree, it's about programming style, but a language can really affect your programming style. The benefit of functional languages is that they support this style of programming very well. Things that are natural in functional languages are more awkward in imperative languages, and vice-versa. There are also some more benefits to functional languages that branch out from this, such as STM being dead easy when certain parts of your program are *guaranteed* not to have side effects.
Chuck
@Chuck: What's STM? @Onorio Catenacci: Compilers already take into account such things when doing things like making inlining decisions. I don't think such things are hugely important from a performance standpoint.
Billy ONeal
@Billy: [Software Transactional Memory](http://en.wikipedia.org/wiki/Software_transactional_memory)
James McNellis
@Billy ONeal I think you're making a possibly dangerous assumption about how compilers optimize code in those cases. And Chuck is right--writing immutable code in an imperative language can be much more complex and therefore hard for another developer to follow later on. Ever read C or C++ code which uses function pointers to any great extent without typedef's? It can be pretty hairy to follow.
Onorio Catenacci
@Onorio Catenacci: Err.. yeah. That's why we have typedefs.
Billy ONeal
@Billy--I pointed to function pointers as an example of something that FP makes a lot simpler. At bottom it's all translated to assembly regardless--it's just that FP makes it a bit easier for the human beings that have to read the code later on.At some point it's not a question of what's theoretically possible; it's a question of what actually happens in practice. A developer could write C/C++ code in a functional way and therefore save all sorts of issues with thread safety. But that's not what will happen in practice for a number of reasons: deadlines, accepted coding idioms etc.
Onorio Catenacci
As an example of how rare a C++ program written in functional style is: How many C++ programs have you seen that declared almost all variables `const`? Just about never, I'd bet. Yet this sort of thing is done all the time in functional languages.
Chuck
@Chuck: What? `const` correctness mandates that all variables that can be `const` are `const`. A C++ developer who is not doing that is a bad C++ developer.
Billy ONeal
@Billy: That's not my understanding, and it's not how most C++ code I've seen works. Const-correctness mandates that variables you *want* to be constant are `const`, to encode that constraint into your program. It doesn't mandate that you design your program to avoid mutable state.
Chuck
@Chuck: You should *want* anything to be constant that can be constant. Why make it mutable when it can be constant?
Billy ONeal
+1  A: 

Higher order functions. Consider a simple reduction operation, summing the elements of an array. In an imperative language, programmers typically write themselves a loop and perform reductions one element at a time.

But that code isn't easy to make multi-threaded. When you write a loop you're assuming an order of operations and you have to spell out how to get from one element to the next. You'd really like to just say "sum the array" and have the compiler, or runtime, or whatever, make the decision about how to work through the array, dividing up the task as necessary between multiple cores, and combining those results together. So instead of writing a loop, with some addition code embedded inside it, an alternative is to pass something representing "addition" into a function that can do the divvying. As soon as you do that, you're writing functionally. You're passing a function (addition) into another function (the reducer). If you write this way then it not only makes more readable code, but when you change architecture, or want to write for heterogeneous architecture, you don't have to change the summer, just the reducer. In practice you might have many different algorithms that all share one reducer so this is a big payoff.

This is just a simple example. You may want to build on this. Functions to apply other functions on 2D arrays, functions to apply functions to tree structures, functions to combine functions to apply functions (eg. if you have a hierarchical structure with trees above and arrays below) and so on.

Yes, I understand that. But there are plenty of ways to handle this problem in imperative languages as well -- i.e. thread pools. Yes, you're writing functionality, but I don't see how that's different in functional languages other than "someone already wrote that particular bit of functionality for me"
Billy ONeal
No, this is completely orthogonal to thread pools. I'm not talking about the code to manage threads, but the interface. The way you launch a `reduce` operation is to pass it a function as an argument. That's functional programming. MapReduce is functional. It takes two arguments, a map function, and a reduce function. Functions that act on functions are called higher order functions. This is what functional programming is all about.
+2  A: 

However, this makes little sense to me because almost all real-world practical programs need/have state to take care of.

You'd be surprised! Yes, all programs need some state (I/O in particular) but often you don't need much more. Just because most programs have heaps of state doesn't mean they need it.

Programming in a functional language encourages you to use less state, and thus your programs become easier to parallelise.

Many functional languages are "impure" which means they allow some state. Haskell doesn't, but Haskell has monads which basically let you get something from nothing: you get state using stateless constructs. Monads are a bit fiddly to work with which is why Haskell gives you a strong incentive to restrict state to as small a part of your program as possible.

I also find it interesting that most major scaling libraries, i.e. MapReduce, are typically written in imperative languages like C or C++.

Programming concurrent applications is "hard" in C/C++. That's why it's best to do all the dangerous stuff in a library which is heavily tested and inspected. But you still get the flexibility and performance of C/C++.

Artelius
I'm thinking of things like http://en.wikipedia.org/wiki/Dijkstra's_algorithm . Lets say I want to write a massively parallel application to find the shortest way to map someone from point A to point B. Dijkstra's_algorithm relies on a queue. While in theroy you can parallelize some bits of the algorithm, there is still the piece of state -- a queue -- which is shared and required for the algorithm to function. Such is the case for every dynamic programming algorithm, and such algorithms are necessary to make some things efficient.
Billy ONeal
Certainly - functional languages aren't a silver bullet. Sometimes the answer is to use an algorithm that is nominally less efficient, but easier to parallelise (e.g. an algorithm that can easily be expressed in a functional language), thus letting you solve a problem faster. Other times you have to construct the concurrent stuff manually [in C/C++/whatever]. But it can be a good strategy to use functional languages where possible and resort to other options when it's necessary.
Artelius
+6  A: 

It's important to add one word: "there's no shared state".

Any meaningful program (in any language) changes the state of the world. But (some) functional languages make it impossible to access the same resource from multiple threads simultaneously. The absence of shared state makes multithreading safe.

Igor Krivokon
+1 -- first answer here that address the root of my conundrum.
Billy ONeal