ansaurus

Question

Most compact and fastest way to store my string in PostgreSQL

Answer 1

+1 A:

You have to determine what the most common use for the data is, in order to determine the appropriate data type. A conversion away from the data type means that an index referencing the column is useless.

OMG Ponies 2010-06-22 18:43:54

I can mold how we use/query the data to accommodate the representation used. An increase in space/time efficiency is worth it.

orangeoctopus 2010-06-22 18:53:31

If you create a function index on the conversion, then its not useless.

rfusca 2010-06-22 19:09:35

@rfusca: True, but DBAs loath function based indexes IME.

OMG Ponies 2010-06-22 19:15:23

Answer 2

+1 A:

I suspect BYTEA will be 2x smaller for space, and 2x faster for comparisons (>, <, =) compared to a VARCHAR representation.

In other database engines you can even avoid the length-header overhead. For example:

MS-SQL:   BINARY(16)
Oracle:   RAW(16)
MySQL:    BINARY(16)

Or if you like length-headers:

MS-SQL:   VARBINARY(16)
Oracle:   BLOB
MySQL:    VARBINARY(16)

PostgreSQL only supports BYTEA, so you always pay for the length-header, but I still go with BYTEA in this situation.

Julius Davies 2010-06-22 23:37:09

Answer 3

+3 A:

Compare these two tables of 10M records:

create table test (a int8 not null, b int8 not null, primary key(a,b));
insert into test
  select generate_series(1,10000000), generate_series(1,10000000);
select pg_size_pretty(pg_total_relation_size('test'));

723 MB

create table test_bytea (a bytea not null);
insert into test_bytea
  select decode(lpad(to_hex(a),16,'0')||lpad(to_hex(b),16,'0'),'hex') from test;
alter table test_bytea add primary key (a);
select pg_size_pretty(pg_total_relation_size('test_bytea'));

804 MB

A bytea with index is 11% bigger than 2*int8. This isn't much, but it means that 11% less rows will be in cache. And sequentional scans will be 11% slower etc.

If your data does not change maybe you should consider a flat file storage of sorted values instead of database - this will be only 152MB per 10M records and searching will be O(log(n)).

Tometzky 2010-06-23 12:30:12

Even with a trillion rows, I wouldn't do this. You're avoiding BYTEA's length-header in exchange for two fixed-width INT8's, but all your queries and foreign keys become more complicated. Personally I would pay the 11% tax to keep things simpler. Or you can switch to MySQL where BINARY(16) will have the same performance as two INT8 fields.

Julius Davies 2010-06-23 16:09:08

Bytea isn't much easier to work with, as it needs escaping or prepared queries. Comparisons of bytea will probably also be slower than 2*int8, as it probably checks byte by byte instead of 8 byte chunks and also needs to check for size.

Tometzky 2010-06-23 21:54:45

ansaurus

tags:

views:

answers:

Most compact and fastest way to store my string in PostgreSQL

related questions