views:

185

answers:

1

Hi!

I currently try to write a program for the VFP unit of the iPhone using ARM assembly code. The VFP can do floatingpoint calculations, but AFAIK no integer arithmetic. However, it can convert a float to a signed integer (4 bytes). Also, according to this quickreference: http://www.voti.nl/hvu/arm/ARMquickref.pdf it seems it does not support any shifting operations

what i would like to do is to convert 4 floats of which i'm sure that each is larger than -127 and smaller than 127 into 4 signed bytes.

if i'd have shifting operations available, i could convert the float to signed integer, then shift the value by 12 bytes to the left (8 and 4 bytes for the next two values respectively) and bitwise OR all four together.

however, since shifting is not available, i need to find another way to do it. Also - i cannot use integer arithmetics (so i can't multiply the already converted integer by 2^n in order to shift but i have to work on the float instead).

Anyone knows how i could achieve that?

btw for those familar with the ARM architecture - i don't want to switch to Thumb instructions, because this is done in a loop operating on many elements and i don't want to switch between thumb and arm instructions inside this loop (since that's expensive)

Thanks!

edit:

additional question: how can I normalize a Vector with three elements?

+2  A: 

You want the VFP ftosis instruction, which converts a single-precision FP value to a 4 byte integer. If you have four floats in s0-s3, then after doing:

ftosis s0, s0
ftosis s1, s1
ftosis s2, s2
ftosis s3, s3

you have four 4 byte integers in s0-s3, which can be stored contiguously to memory with a fstm.

On an ARM processor that supports NEON, you can use vcvt.s32.f32 q0, q0 to do four conversions with one instruction.


Edit to answer your follow-up question, here's a simple example function which takes as input a pointer to four floats in memory and returns the converted values packed into a single int32_t:

_floatToPackedInt:
    fldmias   r0,  {s4-s7}
    ftosizs   s0,   s4
    ftosizs   s1,   s5
    ftosizs   s2,   s6
    ftosizs   s3,   s7
    fmrrs r0, r1,  {s0,s1}
    fmrrs r2, r3,  {s2,s3}
    uxtb      r0,   r0
    uxtb      r1,   r1
    uxtb      r2,   r2
    orr       r0,   r0, r1, lsl #8
    orr       r0,   r0, r2, lsl #16
    orr       r0,   r0, r3, lsl #24
    bx        lr

I didn't really put any effort into tuning this, because you wouldn't want to do conversions this way if they were performance-critical; you'd rather either operate on large arrays of values, and pipeline this code so that several conversions were in flight simultaneously, or interleave it with other operations that are doing useful work as well.

You may also like to insert ssats before the uxtbs to make any out-of-range values saturate instead of wrapping.

Also, be aware that this code will have poor performance on ARMv7 cores; you'll definitely want to use the NEON vector operations on that platform.

Stephen Canon
yes i know this - but the problem is the conversion from signed int to signed byte! in the end, i want to have not 4 4byte integers, but 4 1byte signed bytes inside a single register
genesys
Sorry; it wasn't at all clear that that's what you wanted from your question ("then shift the value by 12 bytes to the left", etc). You can do this directly on NEON, but on ARM cores that have VFP only, you'll need to move the converted values back to the general purpose registers and pack them down to bytes there.
Stephen Canon
are you sure? there is bitwise OR - so i thoght maybe i could build something using floatingpoint arithmetics. If i would have my float in the range of -127<=f<=127 couldn't i multiply it by #4096 (=2^12) and then convert it to int using ftosis? wouldn't this result in the same as shifting the converted int by 12bits to the left? - if not, maybe you could answer me my second question? (see the edit in the question above)
genesys
If there were a bitwise `or` on VFP that could work, but there isn't.
Stephen Canon
well then - can I move the converted values back to general purpose registers and do the conversion there without leaving my VFP loop? (ie. without switching back to THUMB and without setting the vectorsize back to zero)?
genesys
Yes. There's no prohibition against intermixing VFP and general-purpose ARM code.
Stephen Canon
could you probably give an example in your answer of how this would look like? (i'm just at the beginning of learning VFP programming). Will then the VFP continue it's execution and the ARM will convert the integer to a byte and store it at the right position - and VFP and ARM will do theyr job in parallel?
genesys
Thanks a lot for your example! i'm looping over a long array of floats, interpreted as 4 component vectors. first i multiply them by a transformation matrix in VFP, then i normalize them, multiply them by 127.0 and then convert them to int. then i convert to signed byte. i hope while doing this conversion, the VFP will already process the next four floats. the only thing i don't understand in your code is the "bx lr" - is this simply switching back to thumb mode?
genesys
also, wouldn't it be sxtb instead of uxtb in order to keep the sign of the value? (the integer has a value between -127 and 127)
genesys
`bx lr` is just a return. You can't use `sxtb` because it sign-extends the byte that is extracted from the input register, which would fill the high 24 bits with ones if the value is negative. You need to use `uxtb` to keep the bits other than those that you're interested in clear. Alternatively, you could just do `and ri, ri, #0xff`.
Stephen Canon
got it =) it's working fine! thanks a lot
genesys