tags:

views:

64

answers:

3

I have 7 8-bit integer values per record that I want to store in Postgres. Pg doesn't offer a single byte integer type, SMALLINT, or 2 bytes, being the smallest integer datatype. Is there anyway I can store my 7 8-bit numbers and save on space?

Would an array type with a 7 element array be more compact? Or, should I make a binary representation of my 7 numbers (for example, using pack in Perl) and store that in a single bytea field?

Any other suggestions?

A: 

You will want to look into the bytea data type referenced here: http://www.postgresql.org/docs/8.4/interactive/datatype-binary.html

There is also bit string data types: http://www.postgresql.org/docs/8.4/interactive/datatype-bit.html

Jordan S. Jones
+1  A: 

Will you ever lookup records using these values?

If yes - use normal datatypes like int4 (or even int8 if you're on 64bit architecture).

If not - first ask yourself - what is the point of storing this values in Pg? You can use bytea (complicated i/o), or bitstrings (even more complicated i/o) but what is the point? How many billion records you're going to have? Did you actually check that smaller datatype uses less space (hint: it doesn't, check it - there are data alignment issues involved)? Are you working under impression that smaller datatype is faster (it isn't. It's actually more complex to compare two int2 values than two int4 values on 32bit architecture).

depesz
you have valid questions some of which I could have preempted by providing more information. I have 6 8-bit values, and expected 120 billion rows. Hence, the desire to conserve space as much as possible.
punkish
Actually it does save space storing int2 vs int4. I tested on 8.4 and storing 7 x int2 = 38 bytes per tuple; 7 x char(1) = 41 bytes; 7 x int4 = 52 bytes.
Scott Bailey
+1  A: 

Given that the overhead for any row in PostgreSQL is 23 bytes (HeapTupleHeaderData), if you really care about small amounts of space this much you've probably picked the wrong way to store your data.

Regardless, since all the more complicated types have their own overhead (bytea adds four bytes of overhead for example, bit strings 5 to 8), the only way to accomplish what you're looking for is to use a bigint (8 bytes), numerically shifting each value and OR-ing together the result. You can do this using the bit string operations to make the code easier--compute as bit string, then cast to bigint before storing--or just manually multiply/add if you want speed to be better. For example, here's how you store two bytes together into a two byte structure and then get them back again:

int2 = 256 * byte1 + byte2
byte1 = int2 / 256
byte2 = int2 % 256

You can extend the same idea into storing 7 of them that way. Retrieval overhead is still going to be terrible, but you will have actually saved some space in the process. But not very much relative to just the row header.

Greg Smith
each row has 6 x 1 byte integers (?), 3 x 2 byte integers (SMALLINT), and 2 x 4 byte integers (INT). That is a total of 20 bytes per row, plus Pg's overhead. Since Pg provides only SMALLINTs as the smallest value, my 6 1 byte values will take 12 bytes. I am looking for a possible alternative for these. A savings of 6 bytes times 120 billion rows is about 670 GB (if my calculation is correct).That said, I do need to take out these values individually, so I might pay the price in retrieval speed. I need to balance the two. 0.7 TB is not a huge amount of space in the bigger scheme of things.
punkish