views:

152

answers:

2

This is very simple, but I haven't been able to figure it out yet.

This question is regarding a assembly mmx, but it's pure logic.

Imagine the following scenario:

MM0: 04 03 02 01 04 03 02 01  <-- input  
MM1: 02 02 02 02 02 02 02 02  
MM2: 04 03 02 01 04 03 02 01  <-- copy of input

after pcmpgtw MM0, MM1

MM0: FF FF 00 00 FF FF 00 00  <-- words where MM0 is greater than MM1 (comparing words)  
MM1: 02 02 02 02 02 02 02 02  
MM2: 04 03 02 01 04 03 02 01

after pand MM0, MM2  

MM0: 04 03 00 00 04 03 00 00  <-- almost there...
MM1: 02 02 02 02 02 02 02 02  
MM2: 04 03 02 01 04 03 02 01  

What I want is to know fill the zeros of MM0 with 02. I suppose I would have to invert MM0 register in step2, changing the FF's to 00's and the 00's to FF's and then do a and to MM1 and finally a or to merge the two.

If I was able to get:

MM3: 00 00 FF FF 00 00 FF FF

then, pand MM2, MM3

MM1: 04 03 00 00 04 03 00 00  
MM2: 00 00 02 02 00 00 02 02

finally por MM0, MM1 would give me the desired outcome:

MM0: 04 03 02 02 04 03 02 02  <-- Aha!

Summing up, how can I get that MM3 register as 00 00 FF FF 00 00 FF ? How can I invert the bits, proving I only have AND, OR, XOR and NAND instructions available in MMX registers?

Any answer is greatly appreciated. Thanks.

+1  A: 

So you have a mask = 0xFFFF0000FFFF0000; then:

all_ones = 0xFFFFFFFFFFFFFFFF;

inverted_mask = mask XOR all_ones;

merging M0 and M1 is:

M0 = M0 AND mask;
M1 = M1 AND inverted_mask;
M0 = M0 OR M1;

this edits M0 and M1 in place so their values are destroyed. If you want to preserve M1 then you need to store the intermediate result into a temporary variable/register/memory:

M0 = M0 AND mask;
TEMP = M1 AND inverted_mask;
M0 = M0 OR TEMP;
slebetman
+1  A: 

You can also generate the mask using pcmpgtw and swap the order of the arguments. That way you can save a register:

MM0: 04 03 02 01 04 03 02 01  <-- input  
MM1: 02 02 02 02 02 02 02 02  
MM2: 04 03 02 01 04 03 02 01  <-- copy of input


pcmpgtw MM0, MM1    ; MM0 = FF FF 00 00 FF FF 00 00 
pcmpgtw MM1, MM2    ; MM1 = 00 00 FF FF 00 00 FF FF

You may have to make a copy of the MM1 argument because it will get destroyed during mask generation, but this is often faster than loading/generating a 64 bit constant.

A alternative way would be to use PNAND:

pcmpgtw MM0, MM1    ; MM0 = FF FF 00 00 FF FF 00 00 

pand    MM2, MM0    ; leave bytes with FF intact 
pnand   MM1, MM0    ; leave bytes with 00 intact 
por     MM1, MM2    ; combine the results.
Nils Pipenbrinck