views:

1538

answers:

5

Does anyone know how I can get rid of the following assembler warning?

Code is x86, 32 bit:

int test (int x)
{
  int y;
  // do a bit-rotate by 8 on the lower word. leave upper word intact.
  asm ("rorw $8, %0\n\t": "=q"(y) :"0"(x));
  return y;
}

If I compile it I get the following (very valid) warning:

Warning: using `%ax' instead of `%eax' due to `w' suffix

What I'm looking for is a way to tell the compiler/assembler that I want to access the lower 16 bit sub-register of %0. Accessing the byte sub-registers (in this case AL and AH) would be nice to know as well.

I've already choosen the "q" modifier, so the compiler is forced to use EAX, EBX, ECX or EDX. I've made sure the compiler has to picks a register that has sub-registers.

I know that I can force the asm-code to use a specific register (and it's sub-registers), but I want to leave the register-allocation job up to the compiler.

+12  A: 

You can use %w0 if I remember right. I just tested it, too. :-)

int
test(int x)
{
    int y;
    asm ("rorw $8, %w0" : "=q" (y) : "0" (x));
    return y;
}

Edit: In response to the OP, yes, you can do the following too:

int
test(int x)
{
    int y;
    asm ("xchg %b0, %h0" : "=Q" (y) : "0" (x));
    return y;
}

At present, the only place (that I know of) it's documented in is gcc/config/i386/i386.md, not in any of the standard documentation.

Chris Jester-Young
I tested as well.. Do you know the modifiers for the low and high bytes as well?
Nils Pipenbrinck
Thanks, I'm glad it helped!
Chris Jester-Young
A: 

So apparently there are tricks to do this... but it may not be so efficient. 32-bit x86 processors are generally slow at manipulating 16-bit data in general purpose registers. You ought to benchmark it if performance is important.

Unless this is (a) performance critical and (b) proves to be much faster, I would save myself some maintenance hassle and just do it in C:

uint32_t y, hi=(x&~0xffff), lo=(x&0xffff);
y = hi + (((lo >> 8) + (lo << 8))&0xffff);

With GCC 4.2 and -O2 this gets optimized down to six instructions...

Dan
How is 6 instructions supposed to be faster than 1 instruction?! My timing tests (for a billion runs, 5 trials) were: my version = (4.38, 4.48, 5.03, 4.10, 4.18), your version = (5.33, 6.21, 5.62, 5.32, 5.29).
Chris Jester-Young
So, we're looking at a 20% speed improvement. Isn't that "much faster"?
Chris Jester-Young
Chris, absolutely right... your version *is* faster it seems. But not nearly as much as 6-instructions-vs.-1-instruction would lead you to expect, and that's what I was warning about.I didn't actually do the comparison myself, so props to you for testing it!!
Dan
+1  A: 

@Dan,

I need that lower byte swapping primitive for a larger tweak.

I know that 16 bit operations in 32 bit code have been slow and frowned upon, but the code will be surrounded with other 32 bit operations. I hope that the slowness of the 16 bit code will just get lost in the out of order scheduling.

What I want to archive in the end is a mechansim to do all 24 possible byte permutation of a dword in-place. For this you need only three instructions at most: low-byte swap (e.g. xchg al, ah), bswap and 32 bit rotates.

The in-place way does not need any constants (faster code fetch / decode time) and only uses a single register. For x86/32 that may save me up to 6 costly memory-accesses (push/pop) ontop of the ca. 10 instructions I save for byte shuffling.

First tests have shown that such a code can run up to three times faster on my core2, but I have to make more measurements on other machines before I can use it.

My secret plan is to integrate this tweak into GCC one day, but that may not ever happen because GCC is such a huge codebase.

Nils Pipenbrinck
A: 

@Nils,

Gotcha. Well if it's a primitive routine that you're going to be reusing over and over, I have no argument with it... the register naming trick that Chris pointed out is a nice one that I'm going to have to remember.

It would be nice if it made it into the standard GCC docs too!

Dan
@Dan,I checked the GCC documentation twice and then filed a bug report because this info is missing. Who knows - maybe it makes it into the next release.
Nils Pipenbrinck
I found the bug at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37621, and it looks like there may be resistance to documenting this feature since it's only meant for internal use. Hrm...
Dan
+1  A: 

While I'm thinking about it ... you should replace the "q" constraint with a capital "Q" constraint in Chris's second solution:

int
test(int x)
{
    int y;
    asm ("xchg %b0, %h0" : "=Q" (y) : "0" (x));
    return y;
}

"q" and "Q" are slightly different in 64-bit mode, where you can get the lowest byte for all of the integer registers (ax, bx, cx, dx, si, di, sp, bp, r8-r15). But you can only get the second-lowest byte (e.g. ah) for the four original 386 registers (ax, bx, cx, dx).

Dan
Yes, good point, thank you! I'll edit my post now. :-)
Chris Jester-Young