views:

248

answers:

5

I'm at the debugging/optimization phase with an iPhone app. I have one bottleneck left - the only place where the program has a noticeable lag, and it's in the following loop: (By the way, I've renamed the vars with letters and types. (The real names are much more human-readable in the actual app, but make little sense out of context, so I hope this is clear enough.) Here's the loop:

for(i=0;i<xLong; i+=yFloat*zShort){
  aFloat=0.0;
  for(int j=i;j<i+yFloat*zShort;j++){
    aFloat=hArray[j]/kFloat;
  }
  bNSNumber = [NSNumber numberWithFloat:aFloat]; 
  [cNSMutableArray addObject:bNSNumber];
}

All objection creation and clean-up is outside of this loop.

(It should be pretty straight forward what's happening here, but basically I have a very large array (in the millions) and I'm going through that array at chunks of yFloat*zShort length, adding all of the elements in that chunk, and inserting that final sum in another array. So if hArray is a million elements long, and my chunk length is 200, I'll sum the first 200 elements, insert that total in cNSMutableArray, and move on to the next 200 elements in hArray. In the end, cNSMutableArray will be 5000 elements long.)

When the outer loop is around 25k and the inner loop is around 200, this code takes about 4 seconds to run. I would sure like to get that down as much as possible, as in the real world, the outer loop might be quite a bit larger.

Any ideas how to quicken this up?

Thanks for any ideas you have!

+8  A: 

Have you tried to make a C style float array instead of using a NSMutableArray? the overhead of creating that many wrappers (NSNumber) can add up.

Matt Greer
Thanks. Originally the data length wasn't known, but now it is, so you're right, it makes sense to convert it back to a C style array. Will do.
Eric Christensen
A: 

This seems like the kind of calculation that should be spun off in a background thread.

You have several options- NSOperation is a viable alternative, but depending on your data structures it might be easier to use detachNewThreadSelector:toTarget:withObject:

sbooth
Unfortunately, there's nothing really else for the user to do until the loop is done, so it doesn't make sense to me to spin it off. There's no speed benefit to background threads, is there? It's just that you can do other things at the same time, right? Thanks.
Eric Christensen
But if objective-c is anything like windows forms then your GUI will lock up, so doing that will allow you to make a progress bar or something...
RCIX
Yes, that is what I'm getting at. Locking up the GUI is very bad form.- it is much better to display a (cancelable if possible) progress bar or some indication that work is proceeding.
sbooth
Yep, I get that. This loop is inside a much longer script that loads a file. That entire chunk of code is in its own thread and I have a loading screen that is up in the mean time. I didn't see the sense in spawning a third thread, unless there are performance benefits. Thanks though.
Eric Christensen
In that case you're right, spawning another thread would be worthless.
sbooth
+6  A: 

First off, from your description it sounds like the inner loop should read:

for(int j=i;j<i+yFloat*zShort;j++){
    aFloat+=hArray[j]/kFloat;
}

Anyway, since kFloat is not changing, you can move that out of the loop and do the division once:

for(int j=i;j<i+yFloat*zShort;j++){
    aFloat+=hArray[j];
}
aFloat/=kFloat;

That said, this can affect the accuracy of the final value. Without knowing exactly what you are doing, I don't know if that will matter.

R Samuel Klatchko
Ah hah! You found a bug! I wasn't actually summing the block. Clearly that was the original intention, but it looks like at some point, I stopped doing that, which means that I was just resetting aFloat each step, which of course I don't need to do. So I completely removed the inner loop, just setting the value to the first value of that chunk and the time is now a quarter of what it used to be. Thanks!
Eric Christensen
@Eric Christensen: Optimization by removing features is an interesting approach. I'm curious as to why the use of the first element is just as good as the average of the block?
e.James
The code you posted is actually using the last value of the chunk, not the first if that matters.
R Samuel Klatchko
@e.James: There are many times when a point sample is sufficient. One drop of apple juice tastes the same as averaging the taste of every drop of apple juice in the entire bottle. @RSam: Yep, you're right, but the first drop tastes the same as the last drop, so it doesn't matter. Thanks.
Eric Christensen
A: 

You really want to avoid creating objects inside a tight loop. Every time you do that, you're allocating a new object on the heap, which involves a hash insert.

NSResponder
Maybe I'm missing something. As far as I understand, the above code is not creating any new objects within the loop, right?
Eric Christensen
+1  A: 

I see that you already got a nice speedup, but here's my two cents: Floating-point division is notoriously expensive; you could precompute

float invKFloat = 1.0f / kFloat;

and then mulitply by this instead of dividing by kFloat. This means you only have to do the division once, instead of every time in the outer loop.

celion