Given a register of 4 bytes (or 16 for SIMD), there has to be an efficient way to sort the bytes in-register with a few instructions.
Thanks in advance.
Given a register of 4 bytes (or 16 for SIMD), there has to be an efficient way to sort the bytes in-register with a few instructions.
Thanks in advance.
All sorting algorithms require "swapping" values from one place to another. Since you're talking about a literal CPU register, that means any sort would need another register to use as a temporary place to hold the bytes being swapped.
I've never seen a chip with a built-in method for sorting bytes within a register. Not saying it hasn't been done, but I can't think of many uses for such an instruction.
Look up an efficient sorting network for N = the number of bytes you care about (4 or 16). Convert that to a sequence of compare and exchange instructions. (For N=16 that'll be more than 'a few', though.)
Found it! It's in the 2007 paper "Using SIMD Registers and Instructions to Enable Instruction-Level Parallelism in Sorting Algorithms" by Furtak, Amaral, and Niewiadomski. Section 4.
It uses 4 SSE registers, has 12 steps, and runs in 19 instructions including load and store.
The same paper has some excellent work on dynamically making sorting networks with SIMD.