views:

557

answers:

8

I have a one 64-bit integer, which I need to rotate 90 degrees in 8 x 8 area (preferably with straight bit-manipulation). I cannot figure out any handy algorithm for that. For instance, this:

// 0xD000000000000000 = 1101000000000000000000000000000000000000000000000000000000000000

1 1 0 1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

after rotation becomes this:

// 0x101000100000000 = 0000000100000001000000000000000100000000000000000000000000000000

0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

I wonder if there's any solutions without need to use any pre-calculated hash-table(s)?

A: 

If you look at this as a 2 dimensional array then you have the solution no? Just make the rows the new columns. First row is the last column, 2nd is the one before last and so on.

Visually at least, it looks like your solution.

Shaihi
A: 

probably something like that

for(int i = 0; i < 8; i++)
{
    for(int j = 0; j < 8; j++)
    {
        new_image[j*8+8-i] = image[i*8+j];
    }
}
ufukgun
This doesn't quite work. new_image[7] should equal image[0] and the way you have it written is that new_image[8] is actually receiving this value (case i ==j==0). new_image[7] is receiving image[8] (case i==1, j=0)
SauceMaster
+10  A: 

Without using any look-up tables, I can't see much better than treating each bit individually:

unsigned long r = 0;
for (int i = 0; i < 64; ++i) {
    r += ((x >> i) & 1) << (((i % 8) * 8) + (7 - i / 8));
}
Mark Byers
+1 for not using any if-statements. Better branch prediction. :)
csl
By the way, you can easily unroll this loop so you won't need the modulus etc.
csl
If you unroll it completely, all the variable values in it go away and the compiler can do a lot of constant folding. Likely you can take advantage of the fact that there are 64 right shifts, and rather than doing 64 long distance shifts (which may not be unit time), each right shift can take the result of the previous right shifts by doing a single addtional shift, which is unit time.
Ira Baxter
A: 

If an if-powered loop is acceptable, the formula for bits is simple enough:

8>>Column - Row - 1

Column and Row are 0-indexed.

This gives you this mapping:

 7 15 23 31 39 47 55 63
 6 14 22 ...
 5 ...
 4 ...
 3 ...
 2 ...
 1 ...
 0  8 16 24 32 40 48 54
Brian
+2  A: 

If you're going to do this fast, you shouldn't object to lookup tables.

I'd break the 64 bit integers into N-bit chunks, and look up the N bit chunks in a position-selected table of transpose values. If you choose N=1, you need 64 lookups in tables of two slots, which is relatively slow. If you choose N=64, you need one table and one lookup but the table is huge :-}

N=8 seems like a good compromise. You'd need 8 tables of 256 entries. The code should look something like this:

// value to transpose is in v, a long
long r; // result
r != byte0transpose[(v>>56)&0xFF];
r != byte1transpose[(v>>48)&0xFF];
r != byte2transpose[(v>>40)&0xFF];
r != byte3transpose[(v>>32)&0xFF];
r != byte4transpose[(v>>24)&0xFF];
r != byte5transpose[(v>>16)&0xFF];
r != byte6transpose[(v>>08)&0xFF];
r != byte7transpose[(v>>00)&0xFF];

Each table contains precomputed values that "spread" the contiguous bits in the input across the 64 bit transposed result. Ideally you'd compute this value offline and simply initialize the table entries.

If you don't care about speed, then the standard array transpose algorithms will work; just index the 64 bit as if it were a bit array.

I have a sneaking suspicion that one might be able to compute the transposition using bit twiddling type hacks.

Ira Baxter
You can do this with only 1 256-entry table if you notice that all your tables are almost the same, they're just bit shifted.
Chris Dodd
OK, sure. Now one has to trade the off the additional shift for the space. OP's choice.
Ira Baxter
+2  A: 

To expand on my comment to Ira's answer, you can use:

#define ROT_BIT_0(X)    X, (X)|0x1UL
#define ROT_BIT_1(X)    ROT_BIT_0(X), ROT_BIT_0((X) | 0x100UL)
#define ROT_BIT_2(X)    ROT_BIT_1(X), ROT_BIT_1((X) | 0x10000UL)
#define ROT_BIT_3(X)    ROT_BIT_2(X), ROT_BIT_2((X) | 0x1000000UL)
#define ROT_BIT_4(X)    ROT_BIT_3(X), ROT_BIT_3((X) | 0x100000000UL)
#define ROT_BIT_5(X)    ROT_BIT_4(X), ROT_BIT_4((X) | 0x10000000000UL)
#define ROT_BIT_6(X)    ROT_BIT_5(X), ROT_BIT_5((X) | 0x1000000000000UL)
#define ROT_BIT_7(X)    ROT_BIT_6(X), ROT_BIT_6((X) | 0x100000000000000UL)

static unsigned long rot90[256] = { ROT_BIT_7(0) };

unsigned long rotate90(unsigned long v)
{
    unsigned long r = 0;
    r |= rot90[(v>>56) & 0xff];
    r |= rot90[(v>>48) & 0xff] << 1;
    r |= rot90[(v>>40) & 0xff] << 2;
    r |= rot90[(v>>32) & 0xff] << 3;
    r |= rot90[(v>>24) & 0xff] << 4;
    r |= rot90[(v>>16) & 0xff] << 5;
    r |= rot90[(v>>8) & 0xff] << 6;
    r |= rot90[v & 0xff] << 7;
    return r;
}

This depends on 'unsigned long' being 64 bits, of course, and does the rotate assuming the bits are in row-major order with the msb being the upper right, which seems to be the case in this question....

Chris Dodd
+2  A: 

This is quite easy using IA32 SIMD, there's a handy opcode to extract every eighth bit from a 64 bit value (this was written using DevStudio 2005):

char
  source [8] = {0, 0, 0, 0, 0, 0, 0, 0xd0},
  dest [8];

__asm
{
  mov ch,3
  movq xmm0,qword ptr [source]
Rotate2:
  lea edi,dest
  mov cl,8
Rotate1:
  pmovmskb eax,xmm0
  psllq xmm0,1
  stosb
  dec cl
  jnz Rotate1
  movq xmm0,qword ptr [dest]
  dec ch
  jnz Rotate2
}

It rotates the data three times (-270 degrees) since +90 is a bit trickier (needs a bit more thought)

Skizz
+3  A: 

There is an efficient way to perform bit reversal, using O(log n) shift operations. If you interpret a 64-bit UINT as an 8x8 array of bits, then bit reversal corresponds to a rotation by 180 degrees.

Half of these shifts effectively perform a horizontal reflection; the other half perform a vertical reflection. To obtain rotations by 90 and 270 degrees, an orthogonal (i.e. vertical or horizontal) reflection could be combined with a diagonal reflection, but the latter remains an awkward bit.

typedef unsigned long long uint64;

uint64 reflect_vert (uint64 value)
{
    value = ((value & 0xFFFFFFFF00000000ull) >> 32) | ((value & 0x00000000FFFFFFFFull) << 32);
    value = ((value & 0xFFFF0000FFFF0000ull) >> 16) | ((value & 0x0000FFFF0000FFFFull) << 16);
    value = ((value & 0xFF00FF00FF00FF00ull) >>  8) | ((value & 0x00FF00FF00FF00FFull) <<  8);
    return value;
}

uint64 reflect_horiz (uint64 value)
{
    value = ((value & 0xF0F0F0F0F0F0F0F0ull) >> 4) | ((value & 0x0F0F0F0F0F0F0F0Full) << 4);
    value = ((value & 0xCCCCCCCCCCCCCCCCull) >> 2) | ((value & 0x3333333333333333ull) << 2);
    value = ((value & 0xAAAAAAAAAAAAAAAAull) >> 1) | ((value & 0x5555555555555555ull) << 1);
    return value;
}

uint64 reflect_diag (uint64 value)
{
    uint64 new_value = value & 0x8040201008040201ull; // stationary bits
    new_value |= (value & 0x0100000000000000ull) >> 49;
    new_value |= (value & 0x0201000000000000ull) >> 42;
    new_value |= (value & 0x0402010000000000ull) >> 35;
    new_value |= (value & 0x0804020100000000ull) >> 28;
    new_value |= (value & 0x1008040201000000ull) >> 21;
    new_value |= (value & 0x2010080402010000ull) >> 14;
    new_value |= (value & 0x4020100804020100ull) >>  7;
    new_value |= (value & 0x0080402010080402ull) <<  7;
    new_value |= (value & 0x0000804020100804ull) << 14;
    new_value |= (value & 0x0000008040201008ull) << 21;
    new_value |= (value & 0x0000000080402010ull) << 28;
    new_value |= (value & 0x0000000000804020ull) << 35;
    new_value |= (value & 0x0000000000008040ull) << 42;
    new_value |= (value & 0x0000000000000080ull) << 49;
    return new_value;
}

uint64 rotate_90 (uint64 value)
{
    return reflect_diag (reflect_vert (value));
}

uint64 rotate_180 (uint64 value)
{
    return reflect_horiz (reflect_vert (value));
}

uint64 rotate_270 (uint64 value)
{
    return reflect_diag (reflect_horiz (value));
}

In the above code, the reflect_diag() function still requires many shifts. I suspect that it is possible to implement this function with fewer shifts, but I have not yet found a way to do that.

Rhubbarb
+1 probably more efficient than the accepted answer, and fully satisfies the requirements of 'no lookup table'.
int3