ansaurus

Question

Answer 1

A:

Those math lines look fine. I don't know enough Objective C to know what the current = ... line is doing though. Could it be allocating memory on the heap which isn't being reclaimed? What happens if you just comment it out? Have you watched the processes' execution with top, to see if it starts slurping more CPU or memory?

Jay Kominek 2009-10-05 23:09:37

“I don't know enough Objective C to know what the `current = ...` line is doing though. Could it be allocating memory on the heap which isn't being reclaimed?” No. It's a C literal expression, not anything specific to Objective-C and not a heap allocation.

Peter Hosey 2009-10-06 00:16:13

Answer 2

+8 A:

Are you on iPhone? Try using the float variants of these functions: powf, sqrtf, etc

There's more info in point #4 of Kendall Helmstetter Gelner's answer to this SO question.

nall 2009-10-05 23:09:45

Actually, the iPhone *does* have hardware double-precision. However, on the 3gs, single-precision arithmetic is substantially faster because it is executed on the NEON unit instead of VFP. Additionally, the single-precision entry points into the math library are generally fairly well optimized, and faster than the corresponding double-precision functions.

Stephen Canon 2009-10-05 23:53:11

Thanks, I removed that incorrect information from the answer.

nall 2009-10-05 23:55:27

The iPhone CPU may have support for those, but by default everything is compiled for Thumb and using soft float, you need to explicitly switch to ARM or everything is emulated.

Louis Gerbarg 2009-10-06 01:15:43

It uses the soft-float ABI and calls shims for the floating-point operations, but those shims merely switch to ARM mode and use the floating-point hardware. That said, you're absolutely correct that floating-point in thumb mode is a performance hazard on the ARMv6, and that floating-point heavy code should be compiled with thumb turned off on that platform.

Stephen Canon 2009-10-06 01:22:20

ARM floating point hardware does not directly execute trigonometric functions (but will use basic FPU opertions). A tradeoff between performance and accuracy can be made using lookup tables (so a lot faster, but less accurate).

Adriaan 2009-10-08 14:42:47

Answer 3

+5 A:

The normal way to shorten a vector would be along the lines of:

float originalMagnitude = sqrtf(current.x * current.x, current.y * current.y, current.z* current.z);
float desiredMagnitude = originalMagnitude - 1.0f;
float scaleFactor = (originalMagnitude != 0) ? desiredMagnitude / originalMagnitude : 0.0f; // avoid divide-by-zero

current.x *= scaleFactor;
current.y *= scaleFactor;
current.z *= scaleFactor;

That said, no, calling a few trig functions 33 times a second shouldn’t be slowing you down much. On the other hand, -[NSMutableArray removeObjectAtIndex:] could potentially be slow for a big array. A ring buffer (either using NSMutableArray or a C array of structs) would be more efficient.

Ahruman 2009-10-05 23:13:11

OP should provide timings, the array operation is well spotted.

kaizer.se 2009-10-05 23:35:59

I'm not sure exactly what you mean. _historyDataBuffer IS an NSMutableArray. The width of my area, and thus it's maximum before I start replacing, is about 475. Once it gets to that, I take out one at index 0 and then add.

2009-10-06 03:14:12

From the CFArray header (also applies to NSArray):"Insertion or deletion operations will typically be linear *in the number of values in the array*"i.e. even though you're only deleting a single value, the array implementation may need to copy all the data in the array as a result of that operation.

Stephen Canon 2009-10-06 13:02:21

Removing an object at index 0 is potentially an O(n) operation. (NSMutableArray may be faster than that, but it isn’t guaranteed.) The faster approach is to use replaceObjectAtIndex: at a moving index. See the linked article on ring buffers for more information. All that said, a 475-item array shouldn’t slow you noticeably. Like James Jones and kaizer.se said, to find your real problem, profile. (In fact, try it without any maths at all – I suspect it’s the drawing that’s costing you.)

Ahruman 2009-10-06 13:16:22

Answer 4

A:

Other than the other commentor's use of the floating point (as opposed to double operators), doing all that _dataHistoryBuffer stuff will be what's killing your app. That'll churn up the memory like there's no tomorrow, and since you are using the NSValue, then all those objects will be added to the autorelease pool making memory consumption spike. You're much better off avoiding keeping a list of values unless you really, really need it - and if so, figuring out a more appropriate (read: fixed size, non-object) mechanism to store them in. Even a circular buffer of structs (e.g. an array of 10 structs, and then have a counter which does i++ % 10 to index into it) would be better.

AlBlue 2009-10-05 23:39:26

Zan Lynx 2009-10-06 00:10:12

Answer 5

A:

Profile it to see exactly where the problem is. If necessary, comment out a subset of the "math" part at a time. Performance is something people usually guess wrong, even smart, thoughtful, experienced people.

Emilio M Bumachar 2009-10-05 23:46:20

Answer 6

+6 A:

Besides the fact that it's theoretically impossible to simply factor out Earth's gravity, the first step I would take would be to benchmark each of the operations that you're performing (multiplication, division, sin, atan2, etc) and then engineer a way around the operations that take significantly longer to compute (or avoid computing the problematic operations). Make sure to use the same data types in your benchmarking as you will in your finished product.

This is a classic example of the time/accuracy trade-off. There are usually multiple algorithms for performing the same computation and you also have LUTs/interpolation at your disposal.

I ran into the same issues when I made my own Wii-style remote controller. If you identify the expensive operation and are having trouble engineering around it then start another question. :)

James Jones 2009-10-06 00:08:10

Bravo for observing that what the questioner wants to do isn't even possible.

Stephen Canon 2009-10-06 00:22:43

I learned it the hard way. :P This was validated by my professors.

James Jones 2009-10-06 00:24:10

The only way to reliably determine the axis of gravity would be to enforce it by an external constraint... For example, if you were to keep the axis of gravity parallel to the y-axis, then you could reliably determine that gravity is affecting the y-axis. You cannot simply deduct 1g from the net magnitude of gravitation.

James Jones 2009-10-06 00:37:18

After changing my code to the better solution shown in answer 4 by Ahruman, the lagging still hasn't improved. I'll try LUTs later and report back. As for it not being possible, I actually AM graphing the accelerometer output from a wii remote, and I want to correct for the phantom 1g background force from gravity, although I do understand your point about relativity.

2009-10-06 03:08:21

What Einstein's theory states is that you will not be able to "correct for the phantom 1g background force" unless you were to keep the Wii remote perpendicular to the field of gravity and hard code the correction into one of the axes. If you plan on using the Wii remote like most people use it, this will not work for you. And make sure you pin down the bottleneck before you go writing LUT methods. You don't want to decrease your precision or waste your time if you don't have to.

James Jones 2009-10-06 03:31:04

Answer 7

+4 A:

Profile, don't speculate. Don't change a damn thing until you know what to change.

Assuming that you get a profile that shows that all the math really is slowing you down:

Don't ever write pow(someFloat,2). The compiler should be able to optimize this away for you, but often times, on newer hardware, those optimizations may not yet be in place. This should always be written someFloat*someFloat. The pow( ) function is generally the most expensive function in the math library. Simple multiplication will always be at least as fast as calling pow( ), and will always be at least as accurate (assuming IEEE-754 compliant arithmetic). Plus, it's easier for the compiler to optimize.
When working with floats in C, use the suffixed forms of the math library function. sinf is faster than sin. sqrtf is faster than sqrt. Beyond the functions themselves being faster, you avoid unnecessary conversions to and from double.
If you're seeing the slowdown on a ARMv6 processor (not the 3GS or the new iPod Touch), make sure you are not compiling to thumb code when you are doing a lot of floating-point computation. The thumb instruction set (prior to thumb2) cannot access the VFP registers, and thus needs a shim for every floating point operation. This can be quite expensive.
If you just want to decrease the length of the acceleration vector by 1.0 (hint: this doesn't do what you want), there are more efficient algorithms to do so.

Stephen Canon 2009-10-06 00:10:01

Answer 8

A:

Just out of interest - do you know how the Math SQRT function is implemented? If it is using an inefficient approximation algorithm, then it might be your culprit. Your best option is to create some sort of test harness that can get an average performance for each of the instructions that you are using.

Another question - does increasing or reducing the precision of the operators (i.e. by using double value floats rather than singles) change the performance in any way?

Andrew Matthews 2009-10-06 00:55:30

It uses the VFP hardware sqrt instruction. However, even an iterative method would compute millions of square roots per second on an iPhone's CPU.

Stephen Canon 2009-10-06 01:01:29

Answer 9

A:

As others have said, you should profile to be sure. Having said that, yes, it is quite likely that adding the extra calculations did slow it down.

By default, all code for iPhone is compiled for the Thumb-1 instruction set. Thumb-1 does not have native floating point support, so it ends up calling out to a SOFTWARE floating point implementation. There are 2 ways to handle this:

Compile the code for ARM. The processor in the iPhone can freely intermix Thumb and ARM code, so you can just compile the the necessary pieces as ARM. You should note that GCC (and by proxy Xcode) cannot compile an individual function as ARM, you will need to isolate all the relevent code into its on compilation unit. It is probably easiest just to set the entire project to compile for ARM to see if it fixes things (Uncheck "Build Options" > "Compile for Thumb"). You should note that while ARM will speed up floating point, it reduces instruction density thereby hurting cache efficiency and degrading all of your other code, so try to avoid it where you can.
Compile for Thumb-2. Thumb-2 is an enhanced version of Thumb that adds support for some floating point operations. It is only available on iPhone 3GS and the new iPod Touch, so this may not be an option for you. You can do that by switching your architecture to "Optimized," which will build a fat binary with current slow version for older devices, and the faster version for ones that support.

You can also combine both of these options, if that seems like the best choice.

Louis Gerbarg 2009-10-06 01:14:36

Answer 10

A:

Unless I misunderstand your code, you basically scale your point by some factor. I think the following code should be equivalent to what you do.

double radius = sqrt(current.x * current.x 
                     + current.y * current.y 
                     + current.z * current.z);
double newRadius = radius - 1.0;
double scale = newRadius/radius;
current.x *= scale;
current.y *= scale;
current.z *= scale;

abc 2009-10-06 08:19:58

Answer 11

A:

This method will find out what the problem is. The worse your slowdown is, the quicker it will find it. Guesses are things that you suspect but don't know, such as thinking the math is the problem. Guesses are usually wrong, at least to begin with. If you are right, the samples will show you. If you are wrong, they will show you what is right. It never misses.

Mike Dunlavey 2009-10-27 22:08:56

Answer 12

A:

My guess is since you're using autoreleased memory (for NSValue) every 0.03 seconds you're probably not giving the pool much time to release itself. I could be wrong - profiling is the only way to tell.

Try manually allocating and releasing the NSValue and see if it makes a difference.

rein 2009-10-27 22:31:34

ansaurus

tags:

views:

answers:

Performance of math functions?

related questions