ansaurus

Question

Fast rgb565 to YUV (or even rgb565 to Y)

Answer 1

+1 A:

What you really want to look at, I think, is using MMX or the integer SSE instructions for this. That will let you work with a few pixels at a time. I imagine your compiler will be able to generate such code if you specify the correct switches, especially if your code is written nicely enough.

Regarding your existing codes, I wouldn't bother with interleaving instructions of different iterations to gain performance. The out-of-order engine of all x86 processors (excluding Atom) and the caches should handle that pretty well.

Edit: If you need to do horizontal adds you might want to use the PHADDD and PHADDW instructions. In fact, if you have the Intel Software Designer's Manual, you should look for the PH* instructions. They might have what you need.

Nathan Fellman 2010-01-13 21:11:37

I have looked at MMX and SSE. I mentioned that in the summary. I couldn't see anything particularly useful in this instance because MMX is hampered when performing horizontal operations. The operation I need to perform is varying levels of multiplication (or shift) on different parts of the one input source. PMADDWD is more or less the operation I need to perform but that requires getting the data into two words to generate a doubleword result which would then need to be extracted. I seriously doubt an out-of-order cpu could significantly speed the short version of that loop.

Lerc 2010-01-14 06:34:01

The PHADDW and similar instructions are SSSE3 I believe. That cuts out too many systems. Including My laptop. All the good instructions are always just out of reach.

Lerc 2010-01-14 07:24:02

Answer 2

+1 A:

A decent compiler, given the appropriate switches to tune for the CPU variants of most interest, almost certainly knows a lot more about good x86 instruction selection and scheduling than any mere mortal!

Take a look at the Intel(R) 64 and IA-32 Architectures Optimization Reference Manual...

If you want to get into hand-optimising code, a good strategy might be to get the compiler to generate assembly source for you as a starting point, and then tweak that; profile before and after every change to ensure that you're actually making things better.

Matthew Slattery 2010-01-14 01:52:40

That's a quite a bit of faith you have there in the compiler. Care to put it to the test?Y = (rgb565 shr al,1;add ah,ah;add al,ah; And apart from partial register stalls is fairly compact. I'm honestly curious as to whether a compiler takes the same, or better approach.(I shall come back and figure out formatting after dinner)

Lerc 2010-01-14 06:47:39

Too slow. Edit button's gone. Lets try formatting For readability the code above was. `0x7ff >> 4) + (rgb565 shr al,1; add ah,ah; add al,ah`.Well that's not a whole lot better now either :-/

Lerc 2010-01-14 07:28:20

ansaurus

tags:

views:

answers:

Fast rgb565 to YUV (or even rgb565 to Y)

related questions