Given a vector of bytes with length multiple of 8, how can I, using mmx instructions, convert all 2's to 5's, for example?
.data
v1 BYTE 1, 2, 3, 4, 1, 2, 3, 4
Thanks.
edit: 2's and 5's are just an example. They are actually parameters of a procedure.
...
This is very simple, but I haven't been able to figure it out yet.
This question is regarding a assembly mmx, but it's pure logic.
Imagine the following scenario:
MM0: 04 03 02 01 04 03 02 01 <-- input
MM1: 02 02 02 02 02 02 02 02
MM2: 04 03 02 01 04 03 02 01 <-- copy of input
after pcmpgtw MM0, MM1
MM0: FF FF 00 00 FF FF 00 0...
Hi!
Where can I find information about common SIMD tricks? I have an instruction set and know, how to write non-tricky SIMD code, but I know, SIMD now is much more powerful. It can hold complex conditional branchless code.
For example (ARMv6), the following sequence of instructions sets each byte of Rd equal to the unsigned minimum of t...
I'm writing a highly parallel application that's multithreaded. I've already got an SSE accelerated thread class written. If I were to write an MMX accelerated thread class, then run both at the same time (one SSE thread and one MMX thread per core) would the performance improve noticeably?
I would think that this setup would help hide...
Hi, I am curious, do new compilers use some extra features built into new CPUs such as MMX SSE,3DNow! and so?
I mean, in original 8086 there was even no FPU, so compiler that old cannot even use it, but new compilers can, since FPU is part of every new CPU. So, does new compilers use new features of CPU?
Or, it should be more right...
I have an inline assembler loop that cumulatively adds elements from an int32 data array with MMX instructions. In particular, it uses the fact that the MMX registers can accommodate 16 int32s to calculate 16 different cumulative sums in parallel.
I would now like to convert this piece of code to MMX intrinsics but I am afraid that I wi...
I am trying to optimize some arithmetic by using the MMX and SSE instruction sets with inline assembly. However, I have been unable to find good references for the timings and usages of these enhanced instruction sets. Could you please help me find references that contain information about the throughput, latency, operands, and perhaps s...
Does anyone know of an SSE ehanced version of libtiff? Even just an SSE enhanced version of a CCITT Group4 encoder would do, I could do the work of sliding that one in libtiff myself. I only need to work with bitonal images.
Thank you
...