tags:

views:

135

answers:

8

Given a N-dimensional vector of small integers is there any simple way to map it with one-to-one correspondence to a large integer number?

Say, we have N=3 vector space. Can we represent a vector X=[(int16)x1,(int16)x2,(int16)x3] using an integer (int48)y? The obvious answer is "Yes, we can". But the question is: "What is the fastest way to do this and its inverse operation?"

Will this new 1-dimensional space possess some very special useful properties?

+7  A: 

For the above example you have 3 * 32 = 96 bits of information, so without any a priori knowledge you need 96 bits for the equivalent long integer.

However, if you know that your x1, x2, x3, values will always fit within, say, 16 bits each, then you can pack them all into a 48 bit integer.

In either case the technique is very simple you just use shift, mask and bitwise or operations to pack/unpack the values.

Paul R
+1 I was about to write something very similar...
crazyscot
Looks like our gut feelings collide. `;]`
Xavier Ho
+2  A: 

Just to make this concrete, if you have a 3-dimensional vector of 8-bit numbers, like this:

uint8_t vector[3] = { 1, 2, 3 };

then you can join them into a single (24-bit number) like so:

uint32_t all = (vector[0] << 16) | (vector[1] << 8) | vector[2];

This number would, if printed using this statement:

printf("the vector was packed into %06x", (unsigned int) all);

produce the output

the vector was packed into 010203

The reverse operation would look like this:

uint8_t v2[3];

v2[0] = (all >> 16) & 0xff;
v2[1] = (all >> 8) & 0xff;
v2[2] = all & 0xff;

Of course this all depends on the size of the individual numbers in the vector and the length of the vector together not exceeding the size of an available integer type, otherwise you can't represent the "packed" vector as a single number.

unwind
+1  A: 

I'm writing this without having time to check details, but I suspect the best way is to represent your long integer via modular arithmetic, using k different integers which are mutually prime. The original integer can then be reconstructed using the Chinese remainder theorem. Sorry this is a bit sketchy, but hope it helps.

Nathan
A: 

There is some totally non portable ways to make this real fast using packed unions and direct accesses to memory. That you really need this kind of speed is suspicious. Methods using shifts and masks should be fast enough for most purposes. If not, consider using specialized processors like GPU for wich vector support is optimized (parallel).

This naive storage does not possess any usefull property than I can foresee, except you can perform some computations (add, sub, logical bitwise operators) on the three coordinates at once as long as you use positive integers only and you don't overflow for add and sub.

You'd better be quite sure you won't overflow (or won't go negative for sub) or the vector will become garbage.

kriss
A: 

I think what you want can be solved using multi-dimensional space filling curves. The link gives a lot of references on this, which in turn give different methods and insights. Here's a specific example of an invertible mapping. It works for any dimension N.

As for useful properties, these mappings are related to Gray codes.

Hard to say whether this was what you were looking for, or whether the "pack 3 16-bit ints into a 48-bit int" does the trick for you.

brainjam
+2  A: 

If you have sets Si, i=1..n of size Ci = |Si|, then the cartesian product set S = S1 x S2 x ... x Sn has size C = C1 * C2 * ... * Cn.

This motivates an obvious way to do the packing one-to-one. If you have elements e1,...,en from each set, each in the range 0 to Ci-1, then you give the element e=(e1,...,en) the value e1+C1*(e2 + C2*(e3 + C3*(...Cn*en...))).

You can do any permutation of this packing if you feel like it, but unless the values are perfectly correlated, the size of the full set must be the product of the sizes of the component sets.

In the particular case of three 32 bit integers, if they can take on any value, you should treat them as one 96 bit integer.

If you particularly want to, you can map small values to small values through any number of means (e.g. filling out spheres with the L1 norm), but you have to specify what properties you want to have.

(For example, one can map (n,m) to (max(n,m)-1)^2 + k where k=n if n<=m and k=n+m if n>m--you can draw this as a picture of filling in a square like so:

1 2 5   | draw along the edge of the square this way
4 3 6   v
  8 7

if you start counting from 1 and only worry about positive values; for integers, you can spiral around the origin.)

Rex Kerr
Note that the "obvious packing" method is a generalisation of the shifting-and-masking method shown in some other answers - those answers are assuming Cx = C1 = C2 = C3 ..., with Cx being a power of 2 (thus the multiplication can be done with a shift, and the addition with a bitwise or). The general form is likely to be more generally useful.
caf
@caf - Thanks for highlighting that point. I should probably have made that fact more clear in my answer.
Rex Kerr
A: 
#include <stdint.h> // for uint8_t
long x;
uint8_t * p = &x;

or

union X {
   long L;
   uint8_t A[sizeof(long)/sizeof(uint8_t)];
};

works if you don't care about the endian. In my experience compilers generate better code with the union because it doesn't set of their "you took the address of this, so I must keep it in RAM" rules as quick. These rules will get set off if you try to index the array with stuff that the compiler can't optimize away.

If you do care about the endian then you need to mask and shift.

nategoose
+1  A: 

To expand on Rex Kerr's generalised form, in C you can pack the numbers like so:

X = e[n];

X *= MAX_E[n-1] + 1;
X += e[n-1];

/* ... */

X *= MAX_E[0] + 1;
X += e[0];

And unpack them with:

e[0] = X % (MAX_E[0] + 1);
X /= (MAX_E[0] + 1);

e[1] = X % (MAX_E[1] + 1);
X /= (MAX_E[1] + 1);

/* ... */

e[n] = X;

(Where MAX_E[n] is the greatest value that e[n] can have). Note that these maximum values are likely to be constants, and may be the same for every e, which will simplify things a little.

The shifting / masking implementations given in the other answers are a generalisation of this, for cases where the MAX_E + 1 values are powers of 2 (and thus the multiplication and division can be done with a shift, the addition with a bitwise-or and the modulus with a bitwise-and).

caf