views:

253

answers:

3

I've been reading a bit lately about functional languages. Coming from 10+ years of OO development, I'm finding it difficult to get my head around how on earth one can point the pure functional approach (i.e. the same method called with the same parameters does the same thing) at a problem where typically (in an OO program) I would need to cache data.

  • Do we just admit that there might need to be an actor in the program which is not immutable (i.e. the cache). I just watched a presentation by Joe Armstrong on infoq and he seemed pretty dogmatic in this regard!

  • Do we just admit that looking up data might be expensive (because we can never cache it)? If so, how can we control, for example, the load on some shared resource (e.g. a database)

  • Is there some magic fairy dust, which I don't know about yet, which solves the whole problem and then makes a nice cup of tea afterwards.

Certainly a google search for "Erlang Cache" seems to return a fair few results...

+3  A: 

Memoize the function. A cache is just a list/dictionary, and hence can be implemented in a purely functional way.

Craig Stuntz
I should have been clearer (see my comment to Jeremy above). Joe Armstrong said that an actor in an Erlang program should not have state. "If you have state, you do not have scalability". If my functional program (which includes a cache) is therefore not horizontally scalable, was there no point in using the functional language to begin with?
oxbow_lakes
Where Armstrong says "state", I hear "shared state." Shared state really does kill scalability. State itself (whether shared or not) can be a problem from a reliability POV, but very local state doesn't necessarily hurt scalability.
Craig Stuntz
How is a value from a cache not "shared state"? Not much of a cache if it can't share its contents. Also the current contents of the cache is implicitly shared (as is its lock), even if it private to the cache actor itself
oxbow_lakes
It's shared if you share it. :) In Erlang, scaling is granular at the process level. Consider a process which caches the result of a web service call. That process will not need to call the WS again. Another process will need to call the WS, because sharing the cache would make the two processes coupled, killing scalability.
Craig Stuntz
There is shared in the respect that everyone can modify it. And there is shared in the respect that everyone can read it.I think perhaps the point that needs to be emphasized is that querying values in a cache should not result in two calls with the same arguments returning different data.Also any "dependency" on state in a different actor reduces your fault tolerance. across the board.
Jeremy Wall
+2  A: 

There is no reason a Cache and a Functional language can't live together. To be functional you just have to obey the constraint that calling the same function with the same arguments you get the same answer.

For instance: get_data(Query, CacheCriteria)

Just because the get_data uses a cache doesn't mean it's not functional. As long as calling get_data with the same Query, and CacheCriteria arguments always returns the same value then the language can be considered functional.

Jeremy Wall
I agree that we can use a stateful "actor" within a program. Joe Armstrong in his talk was *very* clear. "If you have state, you do not have scalability". I can barely think of a single program I've written where I didn't need to cache some data. In which case, is there no point in using a functional language because it cannot give me scalability?
oxbow_lakes
I think maybe its a difference in your and his definition of state. In the functional paradigm state must be passed in and them modified and passed out. This can be kept consistent within the confines of the language itself but it's impossible at the language borders. Any IO operation by default means that that particular operation is not functional. A seperate cache fits the IO problem. Memoization however would not break this since it's within the confines of the languages borders.
Jeremy Wall
+2  A: 

It is data which must be immutable in Erlang, not actors.

Long-lived actors normally live in a tail-recursive function, the arguments of which serve as their state and certainly can change between calls.

-module(cache).
-export([start/0, get_c/1, put_c/2, clear/1]).

start() -> register(spawn(fun () -> loop(dict:new()) end), cache).

loop(Dict) -> receive
                {get, From, Key} -> From ! {cache_result, Key, dict:fetch(Key, Dict)};
                {set, Key, Value} -> NewDict = dict:store(Key, Value, Dict),
                                     loop(NewDict);
                %% etc.
              end

put_c(Key, Value) -> cache ! {set, Key, Value}
%% etc.

When you call put_c, the actor's "state" changes even though all data involved is immutable.

Alexey Romanov
Thanks for the expansion on the other answers. However, I'd be interested if you could "reply" to my comments on the other answers. That is, if we have shared state in this cache actor, we lose scalability
oxbow_lakes
Could you say precisely when he says this? I don't like watching tech videos; too low information density for me.
Alexey Romanov
Though this one is quite nice, but I can't get the video and slides to show up in one screen.
Alexey Romanov
@alexey_r. I'm not watching 65 minutes of presentation again just to say at which precise moment he says it! He talks about sharing data across a million-node CPU (in 10-years, say) and how expensive that might be if the state resides in an actor. He explicitly shows the functional paradigm as being one in which a message does not cause the state of an actor to change. Modifications are encapsulated entirely in the "return message". I suspect that the dynamics of a cache is one which is largely separate to what he was talking about
oxbow_lakes
*Sigh*. 24 minutes and 18 seconds in: "If you have a functional programming language, you don't have mutable state". Previously he talks about mutable state being incompatible with a fault-tolerant, distributable and concurrent system (19.15, 20.15, 20.37, 22.45). He also talks about "no need to lock a shared resource as there is no shared resource" and that an actor should be "replicable". Obviously with a cache you need to lock *at some point* and replicability isn't immediately obvious either.
oxbow_lakes
At 29.30 he talks about having "no side effects" to the passing of a message.
oxbow_lakes
Yes. Well, he isn't talking about functional languages being any worse than imperative languages at providing scalability: this applies to all languages. He pretty much defines scalability as replicability. A cache _is_ a shared resource; so of course having a cache can kill scalability in this sense! If you scale highly enough, you either recalculate things most of the time anyway or spend most of your time and traffic sinchronizing caches. Or if you have just a single cache, it itself becomes the bottleneck.
Alexey Romanov
I don't know enough about Erlang to know how I would replicate a cache in such a way as to have a node (e.g.) request an item from the (possibly many) cache instances. I've been programming in Scala and I'm not sure how to do this.
oxbow_lakes