ansaurus

Question

Answer 1

+1 A:

If the pixels were smaller, you could use 8 Uint64 registers (they are big and there are plenty of them) to cumulate there the result for rotated matrix.

Example for sizeof(pixel) == 1 and little endian machine:

for (int y = 0; y < 8; ++y){
 // for every line, we get 8 pixels from row y into src0.
 // they should go in the last colomn of the result
 // so after 8 iterations they'll get exactly in the 8ht byte 
  Uint64 src0 = *(Uint64*)(src + dim * y);
  dst0 = (dst0 << 8) | ( src0 & 0xff); // this was pixel src[y][0]
  dst1 = (dst1 << 8) | ((src0 >> 8) & 0xff); // and pixel src[y][1]
  etc...
};
// now the 8 dst0..dst7 registers contain rows 0..7 of the result. 
// putting them there
*(Uint64*)(dst) = dst0;
*(Uint64*)(dst + dim) = dst1;
etc..

The good part is that it's easier to unroll and reorder, and there are fewer memory accesses.

ruslik 2010-10-12 03:35:53

so you mean with the current size of the "pixel" I can't use this?

johnshaddad 2010-10-12 03:38:03

Sure you can, but the benefit could be greater on small pixels. Anyway, if you'll help the compiler to make the memory access in 64bit chunks only on aligned addresses, it will be great. It would be quite inefficient to let it work on unaligned 6 byte structures.

ruslik 2010-10-12 03:44:32

Well I kind of get the concept. But could you please elaborate more on how this could be applied in my case? I am kind of lost after the second line.. could you continue the code till the end so that I test and understand the full picture?

johnshaddad 2010-10-12 03:50:43

I understand that this reduces memory reading by flushing once, but I want to make sure I understand the concept correctly to absorb it.. Could you apply it on the most inner loop as it should be with the size of "pixel" that I have?

johnshaddad 2010-10-12 03:51:36

When you say: dst0 = (dst0 << 8) | ( src0 why are you shifting a newly created Uint64 variable 8 bits to the left, while it is already empty?

johnshaddad 2010-10-12 04:09:46

We need the shift in the next 7 iterations, and by making it in the first one we can skip initialization of the dst registers.

ruslik 2010-10-12 04:13:42

I see... ok could you modify the post to have all the 8 lines (instead of the etc..)? I just want to make sure I understand it correctly

johnshaddad 2010-10-12 04:19:14

Understood.. thanks! I will try to implement and let you know if I have any further questions

johnshaddad 2010-10-12 05:48:35

Ok, so you said this could be used for the size of PIXEL of say 6 or 8 bytes. Could you explain how? Because the largest variable possible as I see is uint64, and this fits only 1 PIXEL structure, opposed to 8 pixels I have per row (if we assume size of TILE to be 8). How would I apply this bit-shifting trick on my case then? There is no way I can store more than one pixel in a uint64 variable, and thus I will end up doing 8 memory reads to get the 8 pixels.. plz explain thnx

johnshaddad 2010-10-12 07:19:49

ansaurus

tags:

views:

answers:

How do Bitshifting and UInt64 work?

related questions