views:

48

answers:

2

Folks,

I'm performing array calculations that are taking a long time to complete. I'd like to optimize my formulas some more. All of the formulas are of the same nature - they perform some high-level function (Average, Slope, Min, Max) across a column of values. However, not all cells in a column are included in the array. I use multiple IF criteria to choose which cells get included. All comparisons are made to the current row. Here's an example of the data:

     A             B                C            D          E
1    Company       Generation       Date         Value      ToCalculate
2    Abc           1                1/1/2010     5.6          
3    ...           ...              ...          ...        ...

E would look something like this

{=Average(If(A2=A2:A1000, If(B2=B2:B1000, If(C2 > C2:C1000, D2:D1000))))}

So once E2 is calculated then I have to autofill down column E. Column F, G, H, ... Uses the same approach, either selects different values to operate on or a different function to perform. My dataset is quite large, and with only a few of these the spreadsheet is taking an hour plus to compute. Every so often I'll add a fourth criteria, all other criteria being the same.

Is there an efficiency? Some thoughts:

  1. Can I use a single array per column instead of thousands per column?
  2. Can I condense the first three criteria so that the output is row numbers? Perhaps then subsequent formulas won't have to search for multiple criteria but can just perform the function?
  3. or somehow build the crtieria up? So a new column returns all rows where the company is the same. another column returns all rows from the first column where generation is the same...and so on...
+1  A: 

For the Average you can do without arrays:

 =AVERAGEIFS(D2:D$1000,A2:A$1000,A2,B2:B$1000,B2,C2:C$1000,"<="&C2)  

As there is also a COUNTIFS and a SUMIFS, I think your slopes could be calculated the same way.

For the rest of the functions (max, min, etc), we should analyze case by case.

I did a slight performance test, and this is apparently better, but of course my datasets are just mocked.

HTH!

Note: Excel 2007 and up only!

Edit - Answering your comment.

Without knowing the dimensions of the problem is difficult to give advice, but I'll risk one anyway:

You could write a VBA function that:

1) Generates a new sheet for each company-generation pair
2) Sorts the data in those sheets by date
3) Adds the formulas to those sheets (no conditionals needed in this context)
4) Recalculates and Gets the results from those formulas and populates the original sheet
5) Deletes the auxiliary sheets

belisarius
Hi belisarius - great suggestion! I use quite a few Count and Averages, both of which have a "IFS" equivalents. This still requires much of the same logic to be recalculated each time. Any way to do #2 and #3 suggested above? If I'm always looking for the same Company and then the Same generation, is there a way to capture the rows and use those as input to the other functions? Would help with Slope, Rsq, and functions that don't support "IFS". Thoughts?
SFun28
+1  A: 

To capture the rows and re-use try this approach:
Sort the data by Company & Generation.
Make a unique list of Companies & generations (use Advanced Filter, Unique Only, Copy)
For each Company generation pair in the list build 2 columns of formulae.
First column gives the count of rows in the data for this pair (use COUNTIFS), second column gives the first row in the data for this pair (=first row for previous pair+count of rows for previous pair).
Then you can use a function like OFFSET to return only the rows of data for the Company-Generation pair and embed this inside the final function/array formula (AVERAGEIFS etc)
You could extend this sort and count approach to include dates if you wanted.
There is a drawback that if the list of cities and generations change you have to change the list of uniques and associated formulas.
There are examples of this approach on my website at
http://www.decisionmodels.com/optspeedk.htm
http://www.decisionmodels.com/optspeedj.htm

Charles Williams