I'm sure there are several ways to do this. For instance, the following should work:
1) make (or load) a mask of 5's and one of 2's in two mmx
registers (mm0
-mm7
)
2) load data into another mmx register, e.g using MOVQ
3) compare the register holding data to be tested with the mask of 2's, e.g. using PCMPEQB
, this will result in a mask of FFh
and 00h
according to whether the element in the register was 2 or not
4) use MASKMOVQ
, the register with 5's and the mask generated by the compare to selectively write out 5's to those positions that previously held 2's. MASKMOVQ
will store data for the mask positions that held FFh
values.
5) Repeat this until finished.
6) at the end, issue EMMS
to exit MMX state. Also issue an SFENCE
or MFENCE
instruction at the end of the routine (because MASKMOVQ
generates a non-temporal hint).
If you use MMX rather than XMM, you won't have to worry about alignment.
Edit: If you are having trouble with the details of the instructions, the Intel® 64 and IA-32 Architectures Software Developer's Manual, Instruction Set Reference (Volumes 2A and 2B), should contain everything you'll ever want to know. You can find them here.