views:

452

answers:

4

Hi,

I have a requirement where I need to store the records at rate of 10,000 records/sec into a database (with indexing on a few fields). Number of columns in one record is 25. I am doing a batch insert of 100,000 records in one transaction block. To improve the insertion rate, I changed the tablespace from disk to RAM.With that I am able to achieve only 5,000 inserts per second.

I have also done the following tuning in the postgres config:

  • Indexes : no
  • fsync : false
  • logging : disabled

Other information:

  • Tablespace : RAM
  • Number of columns in one row : 25 (mostly integers)
  • CPU : 4 core, 2.5 GHz
  • RAM : 48 GB

I am wondering why a single insert query is taking around 0.2 msec on average when database is not writing anything on disk (as I am using RAM based tablespace). Is there something I am doing wrong?

Help appreciated.

Prashant

+1  A: 

are you doing your insert as a series of

INSERT INTO tablename (...) VALUES (...);
INSERT INTO tablename (...) VALUES (...);
...

or as one multiple-row insert:

INSERT INTO tablename (...) VALUES (...),(...),(...);

second one will be faster significantly on 100k rows.

source: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/

zed_0xff
I am using the first way: - BEGIN; - INSERT INTO tablename (...) VALUES (...); - INSERT INTO tablename (...) VALUES (...); - ... - COMMIT;I'll now try the second approach.Thanks
Prashant
that post also suggests COPY would be even faster
araqnid
+2  A: 

Did you place xlog (the WAL segments) also on your RAM-drive? If not, you're still writing to disk. And what about the settings for wal_buffers, checkpoint_segments, etc. ? You have to try to get all your 100,000 records (your single transaction) in your wal_buffers. Increasing this parameter might cause PostgreSQL to request more System V shared memory than your operating system's default configuration allows.

Frank Heikens
Yes, xlog is mounted on a RAM drive. The size of one row is around 240 bytes. So for a batch of 100,000 records, I have set the wal_buffer size to 250MB. With these settings I am getting around 6000-7000 inserts per second. Is there any way to profile the postgres to see which operation is taking time. Since no data is being written on disk, memory transfer should be relatively very fast. 6000 inserts per second ~= 1.5 MB/s which I think is very slow.
Prashant
A: 

I suggest you to use COPY instead of INSERT.

You should also fine tune your postgresql.conf file.

Read about on http://wiki.postgresql.org/wiki/Performance_Optimization

pcent
+1  A: 

Fast Data Loading

  1. Translate your data to CSV.
  2. Create a temporary table (as you noted, without indexes).
  3. Execute a COPY command: \COPY schema.temp_table FROM /tmp/data.csv WITH CSV
  4. Insert the data into the non-temporary table.
  5. Create indexes.
  6. Set appropriate statistics.

Further Recommendations

For large volumes of data:

  1. Split the data into child tables.
  2. Insert it in order of the column from which most of the SELECT statements will use. In other words, try to align the physical model with the logical model.
  3. Adjust your configuration settings.
  4. Create a CLUSTER index (most important column on the left). For example:
    CREATE UNIQUE INDEX measurement_001_stc_index
      ON climate.measurement_001
      USING btree
      (station_id, taken, category_id);
    ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;

Configuration Settings

On a machine with 4GB of RAM, I did the following...

Kernel Configuration

Tell the Kernel that it's okay for programs to use gobs of shared memory:

sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf

PostgreSQL Configuration

  1. Edit /etc/postgresql/8.4/main/postgresql.conf and set:
    shared_buffers = 1GB
    temp_buffers = 32MB
    work_mem = 32MB
    maintenance_work_mem = 64MB
    seq_page_cost = 1.0
    random_page_cost = 2.0
    cpu_index_tuple_cost = 0.001
    effective_cache_size = 512MB
    checkpoint_segments = 10
    
  2. Tweak the values as necessary and suitable to your environment. You will probably have to change them for suitable read/write optimization later.
  3. Restart PostgreSQL.

Child Tables

For example, let's say you have data based on weather, divided into different categories. Rather than having a single monstrous table, divide it into several tables (one per category).

Master Table

CREATE TABLE climate.measurement
(
  id bigserial NOT NULL,
  taken date NOT NULL,
  station_id integer NOT NULL,
  amount numeric(8,2) NOT NULL,
  flag character varying(1) NOT NULL,
  category_id smallint NOT NULL,
  CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

Child Table

CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001:  id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001:  taken date NOT NULL,
-- Inherited from table climate.measurement_001:  station_id integer NOT NULL,
-- Inherited from table climate.measurement_001:  amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001:  flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001:  category_id smallint NOT NULL,
  CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
  CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
  OIDS=FALSE
);

Table Statistics

Bump up the table stats for the important columns:

ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;

Don't forget to VACUUM and ANALYSE afterwards.

Dave Jarvis