tags:

views:

97

answers:

1

Assume this much:
I'm using a 16.16 fixed point system.
System is 32 bit.
CPU has no floating point processor.
Overflow is pretty imminent for multiplication for anything larger than 1.0 * 0.4999

To make one last assumption... lets say the values I'm working will not be so high as to cause overflow in this operation...

//assume that in practical application
//this assignment wouldn't be here as 2 fixed values would already exist...
fixed1 = (int)(1.2341 * 65536);
fixed2 = (int)(0.7854 * 65536);

mask1 = fixed1 & 0xFF; //mask off lower 8 bits

fixed1 >>= 8; //keep upper 24 bits... assume value here isn't too large...

answer = (((fixed2 * fixed1) >> 8) + ((fixed2 * mask1) >> 16));

So the question is... is this a stroke of genius (not to say it hasn't already been thought of or anything) or a complete waste of time?

A: 
Michael Dorgan
This compiles and gives the expected answer, so no... the answer does not come out wrong. Yes, if the values are too large this would absolutely overflow. You might have missed fixed1 >>= 8... but this does produce a correct answer.
Maximus
Ah, you are correct, I missed that initial shift. That changes things completely and yes - this would increase your precision. But if you are going to spend time with an extra var, might as well take it further and use the whole var instead of just 8 bits.
Michael Dorgan
Thanks Michael. You're absolutely right that this could easily overflow... I'm pretty sketchy about using something like it. The overall question, to be honest... is assuming I DON'T overflow... would this be faster than float * float?
Maximus
I guess what I don't know then... how to store the initial multiplication of 2 32 bit values on a 32 bit OS without overflow?
Maximus
On a system without an FPU, way faster. I've not coded with an FPU for 10+years now and as such have become very used to fixed point math. How to do the initial mul with overflow? I'll edit my above post for that so I have room.
Michael Dorgan
Ah. I've read here and there that there would be faster ways to do it in ARM Assembly... I'm programming on Android (Native Java JNI) so I can't assume what the processor will be. On top of that... as much as I know about bitwise operations... I know nothing of Assembler...What I'm realizing for the bottom line, is that a 16.16 32 bit fixed point integer times a 16.16 32 bit integer could result in a 48 bit answer (after shifting back the 64 bit answer) and there must be some kind of controlled upper bound (no more than 24 bits each) for the initial values. Would that be correct to say?
Maximus
A 16.16 multiply results in a 32.32 answer. You can shift away the lowest 16 bits leaving the upper 16-bits if you wish. Most lower precision fixed point routines I've seen either partially pre-shift the terms before multiply and finish shifting afterwards to balance over/underflow. But it comes down to fitting the domain of your numbers to your terms. If you need all 16.16 bits of precision, then you need to do the full on fixed point multiply I describe above - perhaps minus the lowest 16-bit accumulation.
Michael Dorgan
BTW, what I mean by fitting your number to the domain is moving your fixed point position around to match your data. We usually use 24.8 number here for our work. That gives an upper bound in the millions while still allowing 2.75 decimal points of precision. If we needed to do some really high precision trig / unit vector stuff, we could switch it around to 8.24 or even 1.31. Fixed point is very flexible in this way - controlling where you choose to lose precision as opposed to floating point which is out of your control.
Michael Dorgan
I guess I never mentioned but now is as good a time as any... I'm working with OpenGL ES in this case trying to stick to fixed point and doing some manual editing of vertex coordinates. I've wondered, could I just scale down perhaps even though it's expecting 16.16 format?
Maximus
I have no idea on that. I'm looking at OpenGL ES for my next project, but haven't used it much at all up to this point. That said, if the numbers at their lowest levels represent pixel positions, losing most of your sub-pixel accuracy won't hurt. The real answer is to try shifting down to preserve your top and see how bad it looks - if bad at all.
Michael Dorgan