views:

54

answers:

1

Hi , i have the following bit of inline ARM assembly, it works in a debug build but crashes in a release build of iphone sdk 3.1. The problem is the add instructions where i am incrementing the address of the C variables output and x by 4 bytes, this is supposed to increment by the size of a float. I think when i increment at some such stage i am overwriting something, can anyone say which is the best way to handle this

Thanks

C code that the asm is replacing, sum,output and x are all floats

for(int i = 0; i< count; i++)
 sum+= output[i]* (*x++)

 asm volatile(

    ".align 4 \n\t"
    "mov r4,%3    \n\t"  
    "flds s0,[%0]           \n\t"
    "0:                   \n\t"
    "flds s1,[%2]           \n\t"
    //"add %3,%3,#4         \n\t"
    "flds s2,[%1]           \n\t"
    //"add %2,%2,#4         \n\t"
    "subs r4,r4, #1         \n\t"
    "fmacs s0, s1, s2        \n\t"
    "bne 0b                 \n\t"
    "fsts s0,[%0]               \n\t"
    :
    : "r" (&sum), "r" (output), "r" (x),"r" (count)
    : "r0","r4","cc", "memory", 
        "s0","s1","s2"                         
    );
A: 
dwelch
Hi , many thanks, can you see any way of speeding up this loop or any techniques i should be using
tech74
unfortunately you cannot add a ,#4 on the flds. Even if you precomputed the final value for %2 or %1 so that you could get rid of the subs r4,r4,#1 you would have to replace it with a cmp rx,%2 and still need the conditional branch. is there an equivalent to the ldm for float? reading multiple floating point registers at once might make the memory system faster. depends on the bus and if there is such an instruction.
dwelch
unfortunately you cannot add a ,#4 on the fldsAre saying i can't do this "add %2,%2,#4"would this cause the crashes then? Please can you say why i can't do this "add %2,%2,#4"
tech74
No, when loading normal non-floating point registers you can do ldr r0,[r1],#4 which saves an instruction over ldr r0,[r1]; add r1,r1,#4, but it complained when I tried flds s2,[%1],#4, even though %1 is a normal instruction the flds doesnt offer the post increment.
dwelch