views:

567

answers:

6

I have a MySQL database that I'm porting to PostgreSQL (because of GIS features).

Many of the tables have hundreds of thousands of rows, so I need to keep performance in mind.

My problem is that PostgreSQL seems abysmally slow...

For example, if I do a simple SELECT * FROM [table] on a particular table in the MySQL database, let's say one that has 113,000 rows, the query takes around 2 seconds to return the data. In PostgreSQL, the exact same query on the same table takes almost 10 seconds.

Similarly, I have another table with less rows (88,000), and it's worse! MySQL takes 1.3 seconds, PostgreSQL takes 30 seconds!

Is this what I can expect from PostgreSQL, or is there something I can do to make it better?

My OS is XP, and I'm running a 2.7ghz dual code with 3gb ram. The MySQL database is version 5.1, running stock standard. The PostgreSQL database is version 8.4, and I have edited the configuration as follows: shared_buffers = 128MB effective_cache_size = 512MB

Thanks!

Here is the structure of the second table that has around 88,000 rows:

CREATE TABLE nodelink
(
  nodelinkid serial NOT NULL,
  workid integer NOT NULL,
  modifiedbyid integer,
  tabulardatasetid integer,
  fromnodeid integer,
  tonodeid integer,
  materialid integer,
  componentsubtypeid integer,
  crosssectionid integer,
  "name" character varying(64) NOT NULL,
  description character varying(256) NOT NULL,
  modifiedbyname character varying(64) NOT NULL, -- Contains the values from the old engine's ModifiedBy field, since they don't link with any user
  linkdiameter double precision NOT NULL DEFAULT 0, -- The diameter of the Link
  height double precision NOT NULL,
  width double precision NOT NULL,
  length double precision NOT NULL,
  roughness double precision NOT NULL,
  upstreaminvert double precision NOT NULL,
  upstreamloss double precision NOT NULL,
  downstreaminvert double precision NOT NULL,
  downstreamloss double precision NOT NULL,
  averageloss double precision NOT NULL,
  pressuremain double precision NOT NULL,
  flowtogauge double precision NOT NULL,
  cctvgrade double precision NOT NULL,
  installdate timestamp without time zone NOT NULL,
  whencreated timestamp without time zone NOT NULL,
  whenmodified timestamp without time zone NOT NULL,
  ismodelled boolean NOT NULL,
  isopen boolean NOT NULL,
  shapenative geometry,
  shapewgs84 geometry,
  CONSTRAINT nodelink_pk PRIMARY KEY (nodelinkid),
  CONSTRAINT componentsubtype_nodelink_fk FOREIGN KEY (componentsubtypeid)
      REFERENCES componentsubtype (componentsubtypeid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT crosssection_nodelink_fk FOREIGN KEY (crosssectionid)
      REFERENCES crosssection (crosssectionid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fromnode_nodelink_fk FOREIGN KEY (fromnodeid)
      REFERENCES node (nodeid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT material_nodelink_fk FOREIGN KEY (materialid)
      REFERENCES material (materialid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT tabulardataset_nodelink_fk FOREIGN KEY (tabulardatasetid)
      REFERENCES tabulardataset (tabulardatasetid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT tonode_nodelink_fk FOREIGN KEY (tonodeid)
      REFERENCES node (nodeid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT user_nodelink_fk FOREIGN KEY (modifiedbyid)
      REFERENCES awtuser (userid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT work_modellink_fk FOREIGN KEY (workid)
      REFERENCES "work" (workid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);
ALTER TABLE nodelink OWNER TO postgres;
COMMENT ON TABLE nodelink IS 'Contains all of the data that describes a line between any two nodes.';
COMMENT ON COLUMN nodelink.modifiedbyname IS 'Contains the values from the old engine''s ModifiedBy field, since they don''t link with any user';
COMMENT ON COLUMN nodelink.linkdiameter IS 'The diameter of the Link';


I've played a bit more with the select statement. If I just do "Select NodeLinkID from NodeLink", the query is much quicker - less than a second to get 88,000 rows. If I do "Select NodeLinkID, shapenative from NodeLink", the query takes a long time - around 8 seconds. Does this shed any light on what I'm doing wrong?


More findings:

CREATE INDEX nodelink_lengthIDX on nodelink(length);

analyze nodelink

-- Executing query: SELECT * FROM nodelink WHERE Length BETWEEN 0 AND 3.983 Total query runtime: 3109 ms. 10000 rows retrieved.

-- Executing query: SELECT nodelinkID FROM nodelink WHERE Length BETWEEN 0 AND 3.983 Total query runtime: 125 ms. 10000 rows retrieved.

In MySQL, the first query is done in around 120ms, the second is done in around 0.02ms.



Question resolution:

Well folks, it seems that it was all a storm in a teacup...

mjy had it right:

"How did you measure those timings - in your application or the respective command line interfaces?"

To test this theory, I put together a simple console app that ran the same query on the MySQL db, and the PGSQL database. Here is the output:

Running MySQL query: [SELECT * FROM l_model_ldata]
MySQL duration = [2.296875]
Running PGSQL query: [SELECT * FROM nodelink]
PGSQL duration = [2.875]

So the results are comparable. It seems that the pgadmin tool that comes with postgreSQL is quite slow. Thanks to everyone for their suggestions, and assistance!

mjy, if you want to post an answer, I can tick it as the correct answer for future reference.

+2  A: 

Here is a useful article about tuning Postgres- It has definitions and a bit of tips.

This performance tuning article offers a pretty decent overview with some specifics methods of optimization.

Diakonia7
Hey - Thanks, I initially read through both of those pages before actually posting here. I've tweaked a few things, but I still get the same poor performance
Robert
Is there anything in particular that I should check?
Robert
@Robert - what did you tweak?
Diakonia7
shared_buffers = 128MB effective_cache_size = 512MB. There didn't seem to be anything else that would cause the symptoms I'm seeing, but I could be overlooking something :)
Robert
@Robert - It looks like you have already played with some of the config options- But in case you haven't considered these, here are some more: http://www.postgresql.org/docs/8.3/static/runtime-config-resource.html
Diakonia7
@Robert - maybe also take a look at shared memory, particularly SHMMAX: http://www.postgresql.org/docs/8.3/static/kernel-resources.html#SYSVIPC
Diakonia7
Hey - thanks for that. It looks like windows has it's own way of dealing with those parameters ("On Windows, PostgreSQL provides its own replacement implementation of these facilities, and so most of this section can be disregarded.")
Robert
A: 

PostgreSQL uses MVCC architecture, what means so it uses more complicated format for store data on disc than MySQL. It is slower in single access and faster in multi user access.

a) check if your tables are vacuumized - look on VACUUM statement b) use indexes - PostgreSQL has bigger repertoar of indexes then MySQL, so use it - there are GiST, GIN indexes.

Pavel Stehule
MySQL uses MVCC for InnoDB, Falcon, PBXT, and solidDB storage engines.
Alex
I've tried running vacuum, and vacuum analyse, but it doesn't seem to make a difference. I'm just doing a select * - surely indices are irrelevant for such a query?
Robert
A: 

Sounds like you suffer from fragmentation. Do you have lots of updates without running vacuum? Do you update indexed columns so HOT-updates are not used?

What's the output of select relpages, reltuples from pg_class where relname='nodelink'. That'll show you how many disk pages your tuples are stored on.

@Pavel: PostgreSQL certainly is more flexible wrt. indexes, but an index will not help in this case, since he's selecting everything in the table.

Many of the tables have hundreds of thousands of rows, so I need to keep performance in mind.

These are not particularly large tables ...

Is this what I can expect from PostgreSQL, or is there something I can do to make it better?

... so there's probably something else you're doing wrong.

Alex Brasetvik
I've tried running vacuum, and vacuum analyse, but it doesn't seem to make a difference. The output of the query is: relpages: 4345, reltuples: 84936.
Robert
"Do you have lots of updates..." : well, no - not really. Perhaps it's worth mentioning that this database is being created from a migration tool I've written. It looks at the MySQL database, and migrates the data to my new schema. I'm running the slow queries immediately after the migration, and there is no other system using the database, so there are no updates occurring.
Robert
A: 

Did you have the GIS features in MySQL as well? IIRC, that means that you were using MyISAM and not a transaction-capable storage manager, which means you're really not comparing apples to apples.

Also, is your application actually ever going to do this? A completely unqualified SELECT of all the rows? If not, you are better looking at the performance of things that you are actually going to be doing, which probably would involve at least WHERE clauses. (though this of course also cannot be fairly compared to non-crashsafe non-transactional system)

Magnus Hagander
We do have the GIS features in MySQL - but we are moving to pgSQL because of the faster spatial queries. I've tried running these queries before I've PostGIS-ified the database, and the performance is similar. No - the app will never get everything from this table, but take a look at the amendments I've made to the original post - doing a select that yields around 10000, even on an indexed column is still a lot slower than mysql. Therein lies the problem :)
Robert
Well, that means you are comparing a safe system with an unsafe one, since you are on MyISAM. It's obviously going to be slower when you need to deal with actual data safety. Not necessarily *that* much slower, of course :-)
Magnus Hagander
A: 

If you have a table which has hundreds (let alone hundreds of thousands) of records, what possible reason do you have for running the query SELECT * FROM? Perhaps you should think about what data you're actually querying for, and how you could get just the relevant rows from the database.

nickf
No reason. But the "select *" highlights a performance issue that I get.If I do a SELECT * FROM NodeLink LIMIT 10000, then the time is roughly 1/8th of the time to get everything. Similarly, if I limit to 1000, it's roughly 1/80th of the time. If I employ and index based search, it's still slow to get the data. Certainly 30-50 times slower than MySQL. So accessing data in this table is slow - that's my concern.
Robert
A: 

This is much to long for a normal 100000 rows table, so I think there is a problem in PostGIS, not PostgreSQL. Try to get all rows without shapenative and shapewgs84 columns - if it is much faster, then it looks like PostGIS is responsible for slowdowns.

Tometzky
Yes - I actually found this and amended my original post. However, even if I do a SELECT [everythingByTheyPOSTGISColumns], it's still really slow.
Robert