views:

64

answers:

1

In SSE the prefixes 066h (operand size override) 0F2H (REPNE) and 0F3h (REPE) are part of the opcode.

In non-SSE 066h switches between 32-bit (or 64-bit) and 16-bit operation. 0F2h and 0F3h are used for string operations. They can be combined so that 066h and 0F2h (or 0F3h) can be used in the same instruction, because this is meaningful. What is the behavior in an SSE instruction? For instance, we have (ignoring mod/rm for now):

0f 58 --> addps

66 0f 58 --> addpd

f2 0f 58 --> addsd

f3 0f 58 --> addss

But what is this?

66 f2 0f 58

And how about?

f2 66 0f 58

Not to mention the following which has two conflicting REP prefixes:

f2 f3 0f 58

What is the spec for these?

+2  A: 

I do not remember having seen any specification on what you should expect in the case of wildly combining random prefixes, so I guess CPU behaviour may be "undefined" and possibly CPU-specific. (Clearly, some things are specified in e.g. Intel's docs, but many cases aren't covered). And some combinations may be reserved for future use.

My naive assumptions would generally have been that additional prefixes would be no-ops but there's no guarantee. That seems reasonable given that e.g. some optimising manuals recommend multi-byte NOP (canonically 90h) by prefixing with 66h, e.g.:

db 66h, 90h; 2-byte NOP
db 66h, 66h, 90h; 3-byte NOP
db 66h, 66h, 66h, 90h; 4-byte NOP

However, I also know that CS and DS segment override prefixes have aquired novel functions as SSE2 branch hint prefixes (predict branch taken = 3Eh = DS override; predict branch not taken = 2Eh = CS override) when applied to conditional jump instructions.

Anyway, I looked at your examples above, always setting XMM1 to all 0 and XMM7 to all 0FFh by

pxor xmm1, xmm1    ; xmm1 <- 0s
pcmpeqw xmm7, xmm7 ; xmm7 <- FFs 

and then the code in question, with xmm1, xmm7 arguments. What I observed (32bit code on Win64 system and Intel T7300 Core 2 Duo) was:

1) no change observed for addsd by adding 66h prefix

db 66h 
addsd xmm1, xmm7 ;total sequence = 66 F2 0F 58 CF     

2) no change observed for addss by adding 0F2h prefix

db 0f2h     
addss xmm1,xmm7 ;total sequence = F2 F3 0F 58 CF

3) However, I observed a change by prefixing addpd by 0F2h:

db 0f2h    
addpd xmm1, xmm7 ;total sequence = F2 66 0F 58 CF

In this case, the result in XMM1 was 0000000000000000FFFFFFFFFFFFFFFFh instead of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFh.

So my conclusion is that one shouldn't make any assumptions and expect "undefined" behaviour. I wouldn't be surprised, however, if you could find some clues in Agner fog's manuals.

PhiS
In the last case, apparently, the `0F2h` took precedence over the `066h`, and converted the instruction into `addsd`, which is why only one chunk was written.
Nathan Fellman
That'd be one hypothesis, yes. However, I think it'd a bad idea if anybody wanted to rely on such behaviour.
PhiS