tags:

views:

132

answers:

4

I've got a coding problem in Erlang that is probably a common design pattern, but I can't find any info on how to resolve it.

I've got a list L. I want to apply a function f to every element in L, and have it run across all elements in L concurrently. Each call to f(Element) will either succeed or fail; in the majority of cases it will fail, but occasionally it will succeed for a specific Element within L.

If/when a f(Element) succeeds, I want to return "success" and terminate all invocations of f for other elements in L - the first "success" is all I'm interested in. On the other hand, if f(Element) fails for every element in L, then I want to return "fail".

As a trivial example, suppose L is a list of integers, and F returns {success} if an element in L is 3, or {fail} for any other value. I want to find as quickly as possible if there are any 3s in L; I don't care how many 3s there are, just whether at least one 3 exists or not. f could look like this:

f(Int) ->
  case Int of
    3 -> {success};
    _ -> {fail}
  end.

How can I iterate through a list of Ints to find out if the list contains at least one 3, and return as quickly as possible?

Surely this is a common functional design pattern, and I'm just not using the right search terms within Google...

+4  A: 

There basically two different ways of doing this. Either write your own function which iterates over the list returning true or false depending on whether it finds a 3:

contains_3([3|_]) -> true;
contains_3([_|T]) -> contains_3(T);
contains_3([]) -> false.

The second is use an a already defined function to do the actual iteration until a test on the list elements is true and provide it with the test. lists:any returns true or false depending on whether the test succeeds for at least one element:

contains_3(List) -> lists:any(fun (E) -> E =:= 3 end, List).

will do the same thing. Which you choose is up to you. The second one would probably be closer to a design pattern but I feel that even if you use it you should have an idea of how it works internally. In this case it is trivial and very close to the explicit case.

It is a very common thing to do, but whether it would classify as a design pattern I don't know. It seems so basic and in a sense "trivial" that I would hesitate to call it a design pattern.

rvirding
Thanks rvirding, it looks like my example was too trivial... I've got a quite complex and time-consuming function that needs to run against each element in the list. I want to run it against each element on the list concurrently, so I can get a {success} result back ASAP if there is one. However, once I've got that 1st {success}, there's no point in having the function continue to run against every other list element - it'll take too long and chew up too many resources for no benefit. Finally, if there's no {success} for any of the list elements, then I need to recognise that as well.
monch1962
monch1962
+3  A: 

It has been a while since I did any erlang, so I'm not going to attempt to provide you with syntax, however erlang and the OTP have the solution waiting for you.

Spawn one process representing the function; have it iterate over the list, spawning off as many processes as you feel is appropriate to perform the per-element calculation efficiently.

Link every process to the function-process, and have the function process terminate after it returns the first result.

Let erlang/otp to clean up the rest of the processes.

Recurse
That makes a lot of sense - I'll give it a try later today. Thanks for the suggestion
monch1962
A: 

You might want to look at the plists module: http://code.google.com/p/plists/ Though I don't know if plists:any handles

(a) on the 1st {success} received, tell the other sub-processes to stop processing & exit ASAP

Alexey Romanov
+2  A: 

As has already been answered your solution is to use lists:any/2.

Seeing that you want a concurrent version of it:

any(F, List) ->
   Parent = self(),
   Pid = spawn(fun() -> spawner(Parent, F, List) end),
   receive {Pid, Result} -> Result
   end,
   Result.

spawner(Parent, F, List) ->
   Spawner = self(),
   S = spawn_link(fun() -> wait_for_result(Spawner, Parent, length(List)) end),
   [spawn_link(fun() -> run(S, F) end) || X <- List],
   receive after infinity -> ok end.

wait_for_result(Spawner, Parent, 0) ->
   Parent ! {Spawner, false},
   exit(have_result);
wait_for_result(Spawner, Parent, Children) ->
   receive
     true -> Parent ! {Spawner, true}, exit(have_result);
     false -> wait_for_result(Spawner, Parent, Children -1)
   end.

run(S, F) ->
  case catch(F()) of
    true -> S ! true;
    _ -> S ! false
  end.

Note that all the children (the "run" processes) will die when the "wait_for_children" process does an exit(have_result).

Completely untested... Ah, what the heck. I'll do an example:

4> play:any(fun(A) -> A == a end, [b,b,b,b,b,b,b,b]).
false
5> play:any(fun(A) -> A == a end, [b,b,b,b,b,b,a,b]).
true

There could still be bugs (and there probably are).

Daniel Luna
You can optimise this by realising that in `wait_for_result/2` we are not really interested in which worker returns `false`, just how many which have done so. So it is enough to remove the first element of the list each time.
rvirding
You should also mention that doing `exit(have_result)` will kill all remaining worker processes as they are linked (started with `spawn_link`) and `have_result` is not `normal` so it is treated as an error exit.
rvirding
You are correct of course. Updated the answer with your comments.
Daniel Luna
This depends on the function's complexity and the number of samples, but without overlapping the spawns and receives, your code might keep on spawning processes even though the positive answer is already sitting in the mailbox.
Zed
That's true. It was something that I pulled up very quickly and it crossed my mind, but not really long enough to actually handle that case. I'll update my answer.
Daniel Luna
The solutions should handle all the special cases now (I hope). I would do some refactoring before usage in production code though. At least clean up the function/variable names. On another note it's a bit ugly to have the spawner process stay around, but I can't really see a way around that.
Daniel Luna