views:

879

answers:

12

I'm working with graphing accelerometer data here, and I'm trying to correct for gravity. To do this, I get the acceleration vector in spherical coordinates, decrease the radius by 1g, and convert back to cartesian. This method is called on a timer every 0.03 seconds:

//poll accleration
ThreeAxisAcceleration current = self.accelerationData;

//math to correct for gravity:
float radius = sqrt(pow(current.x, 2) + pow(current.y, 2) + pow(current.z, 2));
float theta = atan2(current.y, current.x);
float phi = acos(current.z/radius);

//NSLog(@"SPHERICAL--- RADIUS: %.2f -- THETA: %.2f -- PHI: %.2f", radius, theta, phi);

radius = radius - 1.0;

float newX = radius * cos(theta) * sin(phi);
float newY = radius * sin(theta) * sin(phi);
float newZ = radius * cos(phi);

current = (ThreeAxisAcceleration){newX, newY, newZ};

//end math
NSValue *arrayVal = [NSValue value:&current withObjCType:@encode(ThreeAxisAcceleration)];

if ([_dataHistoryBuffer count] > self.bounds.size.width) {
 [_dataHistoryBuffer removeObjectAtIndex:0];
}

[_dataHistoryBuffer addObject:arrayVal];

[self setNeedsDisplay];

Somehow, the addition of the gravity correction is gradually slowing my code horrendously. I find it hard to believe that this amount of math can slow down the program, but yet without it it can still run through my entire display method which is quite lengthy. Are there any options I can consider here to avoid this? Am I missing something or is the math just that slow? I can comment out between the //math and //end math tags and be just fine.

Thanks for any help.

P.S. incase it may matter, to whom it may interest, I'm programming in cocoa, and this method belongs to a subclass of CALayer, with -drawInContext: implemented.

A: 

Those math lines look fine. I don't know enough Objective C to know what the current = ... line is doing though. Could it be allocating memory on the heap which isn't being reclaimed? What happens if you just comment it out? Have you watched the processes' execution with top, to see if it starts slurping more CPU or memory?

Jay Kominek
“I don't know enough Objective C to know what the `current = ...` line is doing though. Could it be allocating memory on the heap which isn't being reclaimed?” No. It's a C literal expression, not anything specific to Objective-C and not a heap allocation.
Peter Hosey
+8  A: 

Are you on iPhone? Try using the float variants of these functions: powf, sqrtf, etc

There's more info in point #4 of Kendall Helmstetter Gelner's answer to this SO question.

nall
Actually, the iPhone *does* have hardware double-precision. However, on the 3gs, single-precision arithmetic is substantially faster because it is executed on the NEON unit instead of VFP. Additionally, the single-precision entry points into the math library are generally fairly well optimized, and faster than the corresponding double-precision functions.
Stephen Canon
Thanks, I removed that incorrect information from the answer.
nall
The iPhone CPU may have support for those, but by default everything is compiled for Thumb and using soft float, you need to explicitly switch to ARM or everything is emulated.
Louis Gerbarg
It uses the soft-float ABI and calls shims for the floating-point operations, but those shims merely switch to ARM mode and use the floating-point hardware. That said, you're absolutely correct that floating-point in thumb mode is a performance hazard on the ARMv6, and that floating-point heavy code should be compiled with thumb turned off on that platform.
Stephen Canon
ARM floating point hardware does not directly execute trigonometric functions (but will use basic FPU opertions). A tradeoff between performance and accuracy can be made using lookup tables (so a lot faster, but less accurate).
Adriaan
+5  A: 

The normal way to shorten a vector would be along the lines of:

float originalMagnitude = sqrtf(current.x * current.x, current.y * current.y, current.z* current.z);
float desiredMagnitude = originalMagnitude - 1.0f;
float scaleFactor = (originalMagnitude != 0) ? desiredMagnitude / originalMagnitude : 0.0f; // avoid divide-by-zero

current.x *= scaleFactor;
current.y *= scaleFactor;
current.z *= scaleFactor;

That said, no, calling a few trig functions 33 times a second shouldn’t be slowing you down much. On the other hand, -[NSMutableArray removeObjectAtIndex:] could potentially be slow for a big array. A ring buffer (either using NSMutableArray or a C array of structs) would be more efficient.

Ahruman
OP should provide timings, the array operation is well spotted.
kaizer.se
I'm not sure exactly what you mean. _historyDataBuffer IS an NSMutableArray. The width of my area, and thus it's maximum before I start replacing, is about 475. Once it gets to that, I take out one at index 0 and then add.
From the CFArray header (also applies to NSArray):"Insertion or deletion operations will typically be linear *in the number of values in the array*"i.e. even though you're only deleting a single value, the array implementation may need to copy all the data in the array as a result of that operation.
Stephen Canon
Removing an object at index 0 is potentially an O(n) operation. (NSMutableArray may be faster than that, but it isn’t guaranteed.) The faster approach is to use replaceObjectAtIndex: at a moving index. See the linked article on ring buffers for more information. All that said, a 475-item array shouldn’t slow you noticeably. Like James Jones and kaizer.se said, to find your real problem, profile. (In fact, try it without any maths at all – I suspect it’s the drawing that’s costing you.)
Ahruman
A: 

Other than the other commentor's use of the floating point (as opposed to double operators), doing all that _dataHistoryBuffer stuff will be what's killing your app. That'll churn up the memory like there's no tomorrow, and since you are using the NSValue, then all those objects will be added to the autorelease pool making memory consumption spike. You're much better off avoiding keeping a list of values unless you really, really need it - and if so, figuring out a more appropriate (read: fixed size, non-object) mechanism to store them in. Even a circular buffer of structs (e.g. an array of 10 structs, and then have a counter which does i++ % 10 to index into it) would be better.

AlBlue
Zan Lynx
A: 

Profile it to see exactly where the problem is. If necessary, comment out a subset of the "math" part at a time. Performance is something people usually guess wrong, even smart, thoughtful, experienced people.

Emilio M Bumachar
+6  A: 

Besides the fact that it's theoretically impossible to simply factor out Earth's gravity, the first step I would take would be to benchmark each of the operations that you're performing (multiplication, division, sin, atan2, etc) and then engineer a way around the operations that take significantly longer to compute (or avoid computing the problematic operations). Make sure to use the same data types in your benchmarking as you will in your finished product.

This is a classic example of the time/accuracy trade-off. There are usually multiple algorithms for performing the same computation and you also have LUTs/interpolation at your disposal.

I ran into the same issues when I made my own Wii-style remote controller. If you identify the expensive operation and are having trouble engineering around it then start another question. :)

James Jones
Bravo for observing that what the questioner wants to do isn't even possible.
Stephen Canon
I learned it the hard way. :P This was validated by my professors.
James Jones
The only way to reliably determine the axis of gravity would be to enforce it by an external constraint... For example, if you were to keep the axis of gravity parallel to the y-axis, then you could reliably determine that gravity is affecting the y-axis. You cannot simply deduct 1g from the net magnitude of gravitation.
James Jones
After changing my code to the better solution shown in answer 4 by Ahruman, the lagging still hasn't improved. I'll try LUTs later and report back. As for it not being possible, I actually AM graphing the accelerometer output from a wii remote, and I want to correct for the phantom 1g background force from gravity, although I do understand your point about relativity.
What Einstein's theory states is that you will not be able to "correct for the phantom 1g background force" unless you were to keep the Wii remote perpendicular to the field of gravity and hard code the correction into one of the axes. If you plan on using the Wii remote like most people use it, this will not work for you. And make sure you pin down the bottleneck before you go writing LUT methods. You don't want to decrease your precision or waste your time if you don't have to.
James Jones
+4  A: 
  • Profile, don't speculate. Don't change a damn thing until you know what to change.

Assuming that you get a profile that shows that all the math really is slowing you down:

  • Don't ever write pow(someFloat,2). The compiler should be able to optimize this away for you, but often times, on newer hardware, those optimizations may not yet be in place. This should always be written someFloat*someFloat. The pow( ) function is generally the most expensive function in the math library. Simple multiplication will always be at least as fast as calling pow( ), and will always be at least as accurate (assuming IEEE-754 compliant arithmetic). Plus, it's easier for the compiler to optimize.
  • When working with floats in C, use the suffixed forms of the math library function. sinf is faster than sin. sqrtf is faster than sqrt. Beyond the functions themselves being faster, you avoid unnecessary conversions to and from double.
  • If you're seeing the slowdown on a ARMv6 processor (not the 3GS or the new iPod Touch), make sure you are not compiling to thumb code when you are doing a lot of floating-point computation. The thumb instruction set (prior to thumb2) cannot access the VFP registers, and thus needs a shim for every floating point operation. This can be quite expensive.
  • If you just want to decrease the length of the acceleration vector by 1.0 (hint: this doesn't do what you want), there are more efficient algorithms to do so.
Stephen Canon
A: 

Just out of interest - do you know how the Math SQRT function is implemented? If it is using an inefficient approximation algorithm, then it might be your culprit. Your best option is to create some sort of test harness that can get an average performance for each of the instructions that you are using.

Another question - does increasing or reducing the precision of the operators (i.e. by using double value floats rather than singles) change the performance in any way?

Andrew Matthews
It uses the VFP hardware sqrt instruction. However, even an iterative method would compute millions of square roots per second on an iPhone's CPU.
Stephen Canon
A: 

As others have said, you should profile to be sure. Having said that, yes, it is quite likely that adding the extra calculations did slow it down.

By default, all code for iPhone is compiled for the Thumb-1 instruction set. Thumb-1 does not have native floating point support, so it ends up calling out to a SOFTWARE floating point implementation. There are 2 ways to handle this:

  1. Compile the code for ARM. The processor in the iPhone can freely intermix Thumb and ARM code, so you can just compile the the necessary pieces as ARM. You should note that GCC (and by proxy Xcode) cannot compile an individual function as ARM, you will need to isolate all the relevent code into its on compilation unit. It is probably easiest just to set the entire project to compile for ARM to see if it fixes things (Uncheck "Build Options" > "Compile for Thumb"). You should note that while ARM will speed up floating point, it reduces instruction density thereby hurting cache efficiency and degrading all of your other code, so try to avoid it where you can.
  2. Compile for Thumb-2. Thumb-2 is an enhanced version of Thumb that adds support for some floating point operations. It is only available on iPhone 3GS and the new iPod Touch, so this may not be an option for you. You can do that by switching your architecture to "Optimized," which will build a fat binary with current slow version for older devices, and the faster version for ones that support.

You can also combine both of these options, if that seems like the best choice.

Louis Gerbarg
A: 

Unless I misunderstand your code, you basically scale your point by some factor. I think the following code should be equivalent to what you do.

double radius = sqrt(current.x * current.x 
                     + current.y * current.y 
                     + current.z * current.z);
double newRadius = radius - 1.0;
double scale = newRadius/radius;
current.x *= scale;
current.y *= scale;
current.z *= scale;
abc
A: 

This method will find out what the problem is. The worse your slowdown is, the quicker it will find it. Guesses are things that you suspect but don't know, such as thinking the math is the problem. Guesses are usually wrong, at least to begin with. If you are right, the samples will show you. If you are wrong, they will show you what is right. It never misses.

Mike Dunlavey
A: 

My guess is since you're using autoreleased memory (for NSValue) every 0.03 seconds you're probably not giving the pool much time to release itself. I could be wrong - profiling is the only way to tell.

Try manually allocating and releasing the NSValue and see if it makes a difference.

rein