tags:

views:

48

answers:

5

Hi,

My PHP program is working with an array of values ranging from 0 to 7. I'm trying to find the most effective way to store those values in PHP. By most effective I mean using the less number of bits.

It's clear that each value only need 3 bits of storage space (b000=0 to b111=7). But what is the most efficient way to store those 3bits values in a binary string ?

I don't know in advance how many 3 bits values I'll need to store or restore, but it might be a lot, so 64bits is clearly not enough.

I was looking into pack() and unpack(): I could store two values in each byte and use a pack('C', $twoValues), but I'm still loosing 2 bits.

Will it work ? Is there a more effective way of storing those values ?

Thanks

A: 

I would convert each integer to binary, concatenate all of them, and then split the resulting string into bytes. Each byte will be 0-255 so it can be stored as an individual character.

Tim
Don't you mean "each *byte* will be 0-255"? A bit can only store 2 values. ^^
gablin
Nice catch, somehow I always get them mixed up, even after double-checking...
Tim
+1  A: 

The best way is to store them as integers and not get involved with packing things bit by bit. Unless you have an actual engineering reason you need these to be stored as 3-bit values (for example, interfacing with hardware), you're just asking for headaches. Keep in mind, esp for odd bit sizes, they become pretty difficult to have direct access to if you do this. And if you are sticking these values in a database, you wouldnt be able to search or index on values packed like this. Store them as integers, or if in a db, perhaps a short integer or byte.

GrandmasterB
+1  A: 

That kind of technique is only necessary if you will have at least half a billion of these. Think about it, the CPU will have to have data in one register, the mask in another and AND them just to get your value out. Now imagine iterating over a list of these that is long enough to justify that kind of space saving technique. A 50% reduction in space and an order of magnitude slower.

Novikov
A: 

Looking at http://php.net/manual/en/language.types.php, you should store them as integers. However, the question is whether to let one integer value represent many 3-bit values or not. The former is more complex but requires less memory, whereas the first is the opposite. If you don't have an extreme need to reduce the amount of memory you use, then I would suggest the latter (use one integer for one 3-bit value).

The main problem with storing many 3-bit values in one integer is figuring out how many 3-bit values there are. You could use an array of integers, and then have an extra integer which states the total number of 3-bit values. However, as also stated in the manual, the number of bits used for an integer value is platform-dependent. So you would have to know whether an integer is 32 bits or 64 bits, or else you may try to store too many values and lose data, or you risk using more memory than needed (which would be a bad thing as you're aiming to use as little memory in the first place).

gablin
+1  A: 

You didn't ask if it was a good idea - as many suggested, your benefit of that kind of space compression, is easily lost in the extra processing - but that's another topic :)

You're also not mentioning where you're storing the data after. Whatever that storage location/engine is maybe have further conditions and specialized types (eg a database has a binary column format, might have a byte column format, may even support bit storage etc).

But sticking with the topic, I guess best 3 bit storage is as a nibble (waisting one bit) and I suppose I'd combine two nibbles into a byte (loosing two bits overall). Yes you're loosing two bits (if that's key), but it's simple to combine the two values so you're processing overhead is relatively small:

$byte=$val1*7+$val2;
$val2=$byte%7;$val1=($byte-$val2)/7;

If a byte isn't available, you can combine these up to make 16 (4 stored), 32 (8), 64 (16) bit integers. You can also form an array of these values for larger storage.

I'd consider the above more human readable, but you could also use bit-logic to combine and separate the values:

$combinedbyte=$val1<<3|$val2;
$val2=$combinedbyte&7;$val1=($combinedbyte&56)>>3);

(This is effectively what the PACK/UNPACK commands do)

Alternatively you could encode into characters, since in ASCII the first few are protected, you might as well start at A (A-Z+6 punc+a-z gives you 58 when you only need 49 to store your two values).

$char=chr(($val1*7+$val2)+65); //ord('A')=65
$val2=(ord($char)-65)%7;$val1=(ord($char)-65-$val2)/7;

A series of these encoded characters could be stored as an array or in a null terminated string.

NOTE: In the case of -say- 64 bit integers above, we're storing 3 bits in 4 so get 64/4=16 storage locations. This means we're waisting 16 further bits (1 per location) so you might be tempted to add another 5 values, for a total of 21 (21*3=63 bits, only 1 wasted). That's certainly possible (with integer math - although most PHP instances don't work @ 64 bits, or bit-logic solutions) but it complicates things in the long run - probably more trouble than it's worth.

Rudu
That's exactly what I did, but in a uglier way :) Thanks
analogue