Fast Data Loading
- Translate your data to CSV.
- Create a temporary table (as you noted, without indexes).
- Execute a COPY command:
\COPY schema.temp_table FROM /tmp/data.csv WITH CSV
- Insert the data into the non-temporary table.
- Create indexes.
- Set appropriate statistics.
Further Recommendations
For large volumes of data:
- Split the data into child tables.
- Insert it in order of the column from which most of the
SELECT
statements will use. In other words, try to align the physical model with the logical model.
- Adjust your configuration settings.
- Create a
CLUSTER
index (most important column on the left). For example:
CREATE UNIQUE INDEX measurement_001_stc_index
ON climate.measurement_001
USING btree
(station_id, taken, category_id);
ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;
Configuration Settings
On a machine with 4GB of RAM, I did the following...
Kernel Configuration
Tell the Kernel that it's okay for programs to use gobs of shared memory:
sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf
PostgreSQL Configuration
- Edit
/etc/postgresql/8.4/main/postgresql.conf
and set:
shared_buffers = 1GB
temp_buffers = 32MB
work_mem = 32MB
maintenance_work_mem = 64MB
seq_page_cost = 1.0
random_page_cost = 2.0
cpu_index_tuple_cost = 0.001
effective_cache_size = 512MB
checkpoint_segments = 10
- Tweak the values as necessary and suitable to your environment. You will probably have to change them for suitable read/write optimization later.
- Restart PostgreSQL.
Child Tables
For example, let's say you have data based on weather, divided into different categories. Rather than having a single monstrous table, divide it into several tables (one per category).
Master Table
CREATE TABLE climate.measurement
(
id bigserial NOT NULL,
taken date NOT NULL,
station_id integer NOT NULL,
amount numeric(8,2) NOT NULL,
flag character varying(1) NOT NULL,
category_id smallint NOT NULL,
CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
Child Table
CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001: taken date NOT NULL,
-- Inherited from table climate.measurement_001: station_id integer NOT NULL,
-- Inherited from table climate.measurement_001: amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001: flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001: category_id smallint NOT NULL,
CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
OIDS=FALSE
);
Table Statistics
Bump up the table stats for the important columns:
ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;
Don't forget to VACUUM
and ANALYSE
afterwards.