I have a MySQL database that I'm porting to PostgreSQL (because of GIS features).
Many of the tables have hundreds of thousands of rows, so I need to keep performance in mind.
My problem is that PostgreSQL seems abysmally slow...
For example, if I do a simple SELECT * FROM [table] on a particular table in the MySQL database, let's say one that has 113,000 rows, the query takes around 2 seconds to return the data. In PostgreSQL, the exact same query on the same table takes almost 10 seconds.
Similarly, I have another table with less rows (88,000), and it's worse! MySQL takes 1.3 seconds, PostgreSQL takes 30 seconds!
Is this what I can expect from PostgreSQL, or is there something I can do to make it better?
My OS is XP, and I'm running a 2.7ghz dual code with 3gb ram. The MySQL database is version 5.1, running stock standard. The PostgreSQL database is version 8.4, and I have edited the configuration as follows: shared_buffers = 128MB effective_cache_size = 512MB
Thanks!
Here is the structure of the second table that has around 88,000 rows:
CREATE TABLE nodelink
(
nodelinkid serial NOT NULL,
workid integer NOT NULL,
modifiedbyid integer,
tabulardatasetid integer,
fromnodeid integer,
tonodeid integer,
materialid integer,
componentsubtypeid integer,
crosssectionid integer,
"name" character varying(64) NOT NULL,
description character varying(256) NOT NULL,
modifiedbyname character varying(64) NOT NULL, -- Contains the values from the old engine's ModifiedBy field, since they don't link with any user
linkdiameter double precision NOT NULL DEFAULT 0, -- The diameter of the Link
height double precision NOT NULL,
width double precision NOT NULL,
length double precision NOT NULL,
roughness double precision NOT NULL,
upstreaminvert double precision NOT NULL,
upstreamloss double precision NOT NULL,
downstreaminvert double precision NOT NULL,
downstreamloss double precision NOT NULL,
averageloss double precision NOT NULL,
pressuremain double precision NOT NULL,
flowtogauge double precision NOT NULL,
cctvgrade double precision NOT NULL,
installdate timestamp without time zone NOT NULL,
whencreated timestamp without time zone NOT NULL,
whenmodified timestamp without time zone NOT NULL,
ismodelled boolean NOT NULL,
isopen boolean NOT NULL,
shapenative geometry,
shapewgs84 geometry,
CONSTRAINT nodelink_pk PRIMARY KEY (nodelinkid),
CONSTRAINT componentsubtype_nodelink_fk FOREIGN KEY (componentsubtypeid)
REFERENCES componentsubtype (componentsubtypeid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT crosssection_nodelink_fk FOREIGN KEY (crosssectionid)
REFERENCES crosssection (crosssectionid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fromnode_nodelink_fk FOREIGN KEY (fromnodeid)
REFERENCES node (nodeid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT material_nodelink_fk FOREIGN KEY (materialid)
REFERENCES material (materialid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT tabulardataset_nodelink_fk FOREIGN KEY (tabulardatasetid)
REFERENCES tabulardataset (tabulardatasetid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT tonode_nodelink_fk FOREIGN KEY (tonodeid)
REFERENCES node (nodeid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT user_nodelink_fk FOREIGN KEY (modifiedbyid)
REFERENCES awtuser (userid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT work_modellink_fk FOREIGN KEY (workid)
REFERENCES "work" (workid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
ALTER TABLE nodelink OWNER TO postgres;
COMMENT ON TABLE nodelink IS 'Contains all of the data that describes a line between any two nodes.';
COMMENT ON COLUMN nodelink.modifiedbyname IS 'Contains the values from the old engine''s ModifiedBy field, since they don''t link with any user';
COMMENT ON COLUMN nodelink.linkdiameter IS 'The diameter of the Link';
I've played a bit more with the select statement. If I just do "Select NodeLinkID from NodeLink", the query is much quicker - less than a second to get 88,000 rows. If I do "Select NodeLinkID, shapenative from NodeLink", the query takes a long time - around 8 seconds. Does this shed any light on what I'm doing wrong?
More findings:
CREATE INDEX nodelink_lengthIDX on nodelink(length);
analyze nodelink
-- Executing query: SELECT * FROM nodelink WHERE Length BETWEEN 0 AND 3.983 Total query runtime: 3109 ms. 10000 rows retrieved.
-- Executing query: SELECT nodelinkID FROM nodelink WHERE Length BETWEEN 0 AND 3.983 Total query runtime: 125 ms. 10000 rows retrieved.
In MySQL, the first query is done in around 120ms, the second is done in around 0.02ms.
Question resolution:
Well folks, it seems that it was all a storm in a teacup...
mjy had it right:
"How did you measure those timings - in your application or the respective command line interfaces?"
To test this theory, I put together a simple console app that ran the same query on the MySQL db, and the PGSQL database. Here is the output:
Running MySQL query: [SELECT * FROM l_model_ldata]
MySQL duration = [2.296875]
Running PGSQL query: [SELECT * FROM nodelink]
PGSQL duration = [2.875]
So the results are comparable. It seems that the pgadmin tool that comes with postgreSQL is quite slow. Thanks to everyone for their suggestions, and assistance!
mjy, if you want to post an answer, I can tick it as the correct answer for future reference.