views:

119

answers:

3

I'm trying to learn some Erlang and having trouble figuring out the best approach for a particular problem. Here's what I'm doing:

I have a big 2-D array of data (list of lists) whose values are calculated by independent processes. The processes send a message back to an aggregator with the result when they're done calculating. My problem is how to get the aggregated data into a new big array. I could do something like this:

aggregator(Array) ->
    receive
        {Value1_1, {1, 1}} ->
            Value1_1 = Value1_1
    end,
    receive
        {Value1_2, {1, 2}} ->
            Value1_2 = Value1_2
    end,
    % ...
    {{Value1_1, Value2_1},
     {Value2_1, Value2_2}}

but that's terribly lame and scales poorly. What I think I want to do is more like this:

aggregator(Array) when is_full(Array) -> %% I think I know how to write is_full.
    Array;
aggregator(Array) ->
    receive
        {Value, {X, Y}} ->
            aggregator(replace_2d(Array, X, Y)) %% Is this the way to go?
    end.

but I don't see any built-in facilities for replacing values based on an array index in the built-ins. This leads me to suspect that I'm missing some idiomatic way of achieving my goals that would be more obvious to someone experienced with functional programming. Am I? Is my approach all wrong? Should I just write myself a replace_2d function and call it good?

+2  A: 

Do you have to work with the incomplete list while it's being built up - i.e. the elements have to be at the correct positions?

If you really have to, binary trees might help to make index-based updates more efficient,

but if not, just collect the values as they come in and sort afterwards.

Dario
That is an excellent point. Thanks.
Nathon
... or collect values in sorted order.
Hynek -Pichi- Vychodil
+2  A: 

There is an array module that may have the semantics you are looking for. But it is a functional container and may not have the performance characteristics you are looking for, esp. it you are dealing with a large array.

However, your first attempt may have some merit, just do it recursively. (If you know the list sizes going in.)

Completely untested, but start with somthing like this:

aggregator(X, Y) ->
    aggregator(X, Y, []).

aggregator(_, 0, Accum) ->
    Accum;
aggregator(X, Y, Accum) ->
    aggregator(X, Y-1, [aggr_x(X, Y, [])|Accum]).

aggr_x(0, _, AccX) ->
    AccX;
aggr_x(X, Y AccX) ->
    receive
      {Value, {X, Y}} ->
        aggr_x(X-1, Y, [Value|AccX]) ->
    end.

Keep in mind that this will receive high indexed elements first, and that for large data sets, your inbox may get very full, and you will need to consider the performance of matching on a receive with a deep inbox.

dsmith
I'm looking into the array module. I don't see how to do the first approach recursively though.
Nathon
Sorry, this was two separate suggestions-- use the array module or do the receive recursively based on pattern matching. The code that I give here is essentially that same as your first snip-it of code, but with a single receive that is called recursively. And it returns arrays instead of tuples-of-tuples.
dsmith
+3  A: 

At end is an explanation of how I am wrong.

Arrays are implemented so that when functional semantics allow, they are updated in O(1) time. That is, if it only is referenced once and the reference is dropped as soon as the new array is computed, the new array will be computed by overwriting the changed entry. As this is all handled by the compiler/interpreter, you still have the safety of functional programming.

Because Erlang arrays are only one-dimensional, you need to do the index computation yourself.. Perhaps having one aggregator per row would be reasonable and help when running on many processors?

Example code assumes 0 is Origo for Row and Column and that every position will be received exactly once.

aggregator(Rows, Cols)->
  Size = Rows*Cols,
  aggregator(Cols, array:new(Size), Size).

aggregator(Cols, Array, 0)->Array;
aggregator(Cols, Array, N)->
  receive
    {Value, {Col, Row}}->
      aggregator(Cols, array:set(Col+Row*Cols, Value, Array), N-1)
  end.

I have tested the fixed array structure by running:

test_array(Size, Times)->
  test_array(Size, array:new(Size, {default, 0}), Times).
test_array(_Size, Array, 0)->
  Array;
test_array(Size, Array, Times) ->
  X = random:uniform(Size)-1,
  V = array:get(X, Array),
  test_array(Size, array:set(X, V+1, Array), Times-1).

with Times=100000000 and Size at [10, 100, 1000, 10000, 100000]. Wallclock times in seconds were: [101,129,140,182,241] This is very roughly about O(log(Size)) where I would have expected about O(1). Thus I have showed I was wrong, but perhaps in the future array will behave more like I thought it did.

Koistinen
Are you referring to the array() data type, or are you referring to lists? The array() data type is simply a tuple and I wouldn't think that the compiler knows anything specific about it. Does this optimization you are referring to apply generally to all tuples? Do you know of any reading materials explaining this? Thanks
dsmith
I have tested a fixed array and it seems that you are right and I remembered wrongly. I'll update my answer reflect this.
Koistinen