ansaurus

Question

Add the upper and lower 64-bits of a 128-bit xmm register

Answer 1

+2 A:

First off, why are you using quadwords to represent values that would fit in a 16-bit format? Leaving that aside, a couple solutions:

pshufd xmm1, xmm0, EEh
paddq  xmm0, xmm1
movd   temp, xmm0

or

movdqa xmm1, xmm0
psrldq xmm1, 8
paddq  xmm0, xmm1
movd   temp, xmm0

or

movhlps xmm1, xmm0
paddq   xmm0, xmm1
movd    temp, xmm0

Note that you don't actually need to use paddq, you can get away with one of the narrower adds if you prefer.

edit summing four double quadwords -- what you have is pretty much fine. Given that you know that all the data in them fits into the low doubleword of each slot, you could try something like:

shufps  xmm0, xmm2, 88h
shufps  xmm4, xmm6, 88h
paddd   xmm0, xmm4
psrlq   xmm1, xmm0, 32
paddd   xmm0, xmm1
movhlps xmm1, xmm0
paddd   xmm0, xmm0
movd    temp, xmm0

which may or may not prove to be faster.

As for EMMS, it's just another instruction. After any code that touches the MMX registers, before any code that uses the x87 floating-point instructions you need to have emms.

Stephen Canon 2009-12-11 21:22:38

@Stephen: The previous operations need the double quadwords to simultaneously work on 128 bytes of information. After that, a sequence of summations results in the final result with the aforementioned upperbound.

Jacob 2009-12-11 22:12:36

*shrug*, fair enough. Anyway, any of the sequences I put up should work for you, and avoid the legacy mmx usage.

Stephen Canon 2009-12-11 22:16:21

Thanks! It actually messed up the rest of my code such that all the floats were reduced to -1.#IND!

Jacob 2009-12-11 22:18:55

Yeah, if you use the MMX registers, you need to make sure to do an `EMMS` before any code that uses x87 instructions.

Stephen Canon 2009-12-11 22:25:35

ansaurus

tags:

views:

answers:

Add the upper and lower 64-bits of a 128-bit xmm register

related questions