ansaurus

Question

Slow insert speed in Postgresql memory tablespace

Answer 1

+1 A:

are you doing your insert as a series of

INSERT INTO tablename (...) VALUES (...);
INSERT INTO tablename (...) VALUES (...);
...

or as one multiple-row insert:

INSERT INTO tablename (...) VALUES (...),(...),(...);

second one will be faster significantly on 100k rows.

source: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/

zed_0xff 2010-05-28 06:41:07

I am using the first way: - BEGIN; - INSERT INTO tablename (...) VALUES (...); - INSERT INTO tablename (...) VALUES (...); - ... - COMMIT;I'll now try the second approach.Thanks

Prashant 2010-05-28 06:45:56

that post also suggests COPY would be even faster

araqnid 2010-05-28 10:56:19

Answer 2

+2 A:

Did you place xlog (the WAL segments) also on your RAM-drive? If not, you're still writing to disk. And what about the settings for wal_buffers, checkpoint_segments, etc. ? You have to try to get all your 100,000 records (your single transaction) in your wal_buffers. Increasing this parameter might cause PostgreSQL to request more System V shared memory than your operating system's default configuration allows.

Frank Heikens 2010-05-28 07:16:36

Yes, xlog is mounted on a RAM drive. The size of one row is around 240 bytes. So for a batch of 100,000 records, I have set the wal_buffer size to 250MB. With these settings I am getting around 6000-7000 inserts per second. Is there any way to profile the postgres to see which operation is taking time. Since no data is being written on disk, memory transfer should be relatively very fast. 6000 inserts per second ~= 1.5 MB/s which I think is very slow.

Prashant 2010-05-28 10:23:08

Answer 3

A:

I suggest you to use COPY instead of INSERT.

You should also fine tune your postgresql.conf file.

Read about on http://wiki.postgresql.org/wiki/Performance_Optimization

pcent 2010-05-28 19:55:57

Answer 4

+1 A:

Fast Data Loading

Translate your data to CSV.
Create a temporary table (as you noted, without indexes).
Execute a COPY command: \COPY schema.temp_table FROM /tmp/data.csv WITH CSV
Insert the data into the non-temporary table.
Create indexes.
Set appropriate statistics.

Further Recommendations

For large volumes of data:

Split the data into child tables.
Insert it in order of the column from which most of the SELECT statements will use. In other words, try to align the physical model with the logical model.
Adjust your configuration settings.
Create a CLUSTER index (most important column on the left). For example:

    CREATE UNIQUE INDEX measurement_001_stc_index
      ON climate.measurement_001
      USING btree
      (station_id, taken, category_id);
    ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;

Configuration Settings

On a machine with 4GB of RAM, I did the following...

Kernel Configuration

Tell the Kernel that it's okay for programs to use gobs of shared memory:

sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf

PostgreSQL Configuration

Edit /etc/postgresql/8.4/main/postgresql.conf and set:

shared_buffers = 1GB
temp_buffers = 32MB
work_mem = 32MB
maintenance_work_mem = 64MB
seq_page_cost = 1.0
random_page_cost = 2.0
cpu_index_tuple_cost = 0.001
effective_cache_size = 512MB
checkpoint_segments = 10

Tweak the values as necessary and suitable to your environment. You will probably have to change them for suitable read/write optimization later.
Restart PostgreSQL.

Child Tables

For example, let's say you have data based on weather, divided into different categories. Rather than having a single monstrous table, divide it into several tables (one per category).

Master Table

CREATE TABLE climate.measurement
(
  id bigserial NOT NULL,
  taken date NOT NULL,
  station_id integer NOT NULL,
  amount numeric(8,2) NOT NULL,
  flag character varying(1) NOT NULL,
  category_id smallint NOT NULL,
  CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

Child Table

CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001:  id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001:  taken date NOT NULL,
-- Inherited from table climate.measurement_001:  station_id integer NOT NULL,
-- Inherited from table climate.measurement_001:  amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001:  flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001:  category_id smallint NOT NULL,
  CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
  CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
  OIDS=FALSE
);

Table Statistics

Bump up the table stats for the important columns:

ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;

Don't forget to VACUUM and ANALYSE afterwards.

Dave Jarvis 2010-05-28 20:12:14