views:

323

answers:

3

I'm just starting to learn assembly in my computer science class, and I have an assignment to round a floating-point value using a specified rounding mode. I've tried to implement this using fstcw, fldcw, and frndint. I modify the rounding control bits, round the number, and then restore the previous control bits (a requirement of the assignment).

The current outstanding problem is that the instruction fld %1 seems to load the wrong value into the st(0) floating-point register (for example, if I call the function with a value of 2.6207, the number -1.9427(...)e-29 gets loaded into the register). This may be due to a misuse of gcc's inline asm(), or something else, but I'm not sure why it happens.

Here's what I have:

double roundD (double n, RoundingMode roundingMode)
{
    // control word storage (2 bytes for previous, 2 for current)
    char *cw = malloc(4*sizeof(char));
    char *cw2 = cw + 2;

    asm("fstcw %3;" // store control word in cw
        "mov %3,%4;" // copy control word into cw2
        "and $0xF3FF,%4;" // zero out rounding control bits
        "or %2,%4;" // put new mode into rounding control bits
        "fldcw %5;" // load the modified control word
        "fld %1;" // load n into st(0)
        "frndint;" // round n
        "fstp %0;" // load st(0) back into n
        "fldcw %3;" // load the old control word from cw
        : "=m" (n)
        : "m" (n), "m" (roundingMode),
          "m" (cw), "r" (cw2), "m" (cw2) // mov requires one argument in a register
        );

    free(cw);

    return n;
}

I'd appreciate any pointers to what's wrong with that code, specifically relating to the fld %1 line and the asm inputs/outputs. (Of course, if you can find other problems, feel free to let me know about them as well.) I don't want anyone to do my homework for me, just point me in the right direction. Thanks!

+1  A: 

Here's what I've got. It's not tested, but hopefully would be less gnarly for you to work with. :-)

double
roundd(double n, short mode)
{
    short cw, newcw;

    __asm__("fstcw %w0" : "=m" (cw));
    newcw = cw & 0xf3ff | mode;
    __asm__("fldcw %w0" : : "m" (newcw));
    __asm__("frndint" : "+t" (n));
    __asm__("fldcw %w0" : : "m" (cw));
    return n;
}

Although, if you're not required to use assembly to achieve your rounding mode, think about using the functions in <fenv.h> instead. :-)

Chris Jester-Young
I am required to use assembly :)
jtbandes
@jtbandes: Cool. In that case, feel free to test out my version and let me know what needs fixing. :-)
Chris Jester-Young
How does the `+t` constraint work? I can't find information about it where I found the others.
jtbandes
@jtbandes: `+` means in/out (used when the instruction reads and writes to the same register), and `t` means `st(0)`. Read the "Machine Constraints" section of the GCC manual: http://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html :-)
Chris Jester-Young
A: 

As the sign changes, it means that the sign bit (which is the most significant, the first one) is not correct. That suppose to me that the pointer %1 is wrongly aligned. If you have one byte, it can begin on 0,1,2... but if you access two bytes, the address must be 0,2,4.... and in case of double the address must be even dividable by 8: 0,8,16

So check if the address which you use to load the value is dividable by 8. Assembly has the align keyword to guarantee that your data is correctly aligned.

Thorsten S.
That alignment is NOT needed by x86 but only advisable for performance.
Ritsaert Hornstra
+2  A: 

At least one issue with your current code is it is using the single precision floating point versions of fld and fstp. If you replace them with fldl and fstpl it will probably work.

tyranid
Thorsten S.
+1 Yes, it seems the *l versions will work. The snippet I posted, when disassembled, have the *l suffix too. (Obviously, in my case I just chose to let gcc do all the hard work, by using constraints rather than manually coding in the load/store instructions.)
Chris Jester-Young
I tried to maintain the spirit of the question but your answer was far better :)
tyranid