views:

526

answers:

1

Clever People,

In Excel, it appears that wrapping an array formula in a SUM does not produce the sum of the contents of the array in all cases. (That is, it appears to me; clearly, I could well be confused.)

This came up when trying to write the equivalent of an inner join in Excel. For reasons involving Excel for Mac, I'm doing this without using Visual Basic or MS Query: only array formulas and INDEX/MATCH. I have, I think, succeeded, apart from this bizarre behaviour that I am at a loss to understand.

Currently I am using a kludge, but it would be nice not to have to. Any help or advice would be very greatly appreciated.

The following reproduces the issue:

Table 1

key | fkey
----+------
  a |   x
  a |   y
  b |   x
  b |   y

Table 2

key | value
----+------
  x |   1
  y |  10

The layout above is supposed to represent how these two little tables appear in the spreadsheet: nothing clever is going on. The four obvious ranges are named as table1.key, table1.fkey, table2.key, and table2.value.

So, then, the inner join. I'd like those values from Table 2 that correspond to a specific key in Table 1, with the join on `table1.fkey=table2.key'. Entering the formula:

{=("a"=table1.key)*INDEX(table2.value, MATCH(table1.fkey, table2.key, FALSE))}

into two vertical cells gives the required result; namely:

1
10

(Cool, or what?) So far, so fine. But now I actually want the sum of the two numbers above. So I simply wrapped the formula above in a SUM, and entered the result in a single cell:

{=SUM(("a"=table1.key)*INDEX(table2.value, MATCH(table1.fkey, table2.key, FALSE)))}

Result?

2

Does anyone have any idea what on earth is going on?

James

+1  A: 

The row_num argument of INDEX cannot be an array when used in a single cell array formula. The INDEX function is always returning 1. Try this

=SUM(INDEX(table2.value,{2,1}))

It will always return 10, because INDEX will only accept the first element of the array. I don't know why they work differently for single cell array formulas vs multicell.

Dick Kusleika
Huh! Well, many thanks for this example -- it's nice to have the problem isolated. Apropos Tim Robinson's comment: my kludge is to leave the SUM in but to demand an array answer over two cells. This gives the *desired* answer in *both* cells.
James G
It would be nice if we just had an SQL function.
Dick Kusleika
Indeed! (Or the "official spec" for array formulas, if such exist. I don't suppose you know of such?)
James G
No, I don't know of any published spec.
Dick Kusleika