views:

2535

answers:

11

I'm surprised this hasn't been posted yet. Any interesting tricks that you know about in Postgres? Obscure config options and scaling/perf tricks are particularly welcome.

I'm sure we can beat the 9 comments on the corresponding MySQL thread :)

+8  A: 

COPY

I'll start. Whenever I switch to Postgres from SQLite, I usually have some really big datasets. The key is to load your tables with COPY FROM rather than doing INSERTS. See documentation:

http://www.postgresql.org/docs/8.1/static/sql-copy.html

The following example copies a table to the client using the vertical bar (|) as the field delimiter:

COPY country TO STDOUT WITH DELIMITER '|';

To copy data from a file into the country table:

COPY country FROM '/usr1/proj/bray/sql/country_data';

See also here: http://stackoverflow.com/questions/364017/faster-bulk-inserts-in-sqlite3/759866#759866

ramanujan
This also comes in handy for csv imports.
ChristopheD
+6  A: 
  • My by far favorite is generate_series: at last a clean way to generate dummy rowsets.
  • Ability to use a correlated value in a LIMIT clause of a subquery:

    SELECT  (
            SELECT  exp_word
            FROM    mytable
            OFFSET id
            LIMIT 1
            )
    FROM    othertable
    
  • Abitlity to use multiple parameters in custom aggregates (not covered by the documentation): see the article in my blog for an example.
Quassnoi
+1, generate_series() is just what you need for quite a few things (e.g. whenever you need a "dummy table"). The 2nd snippet looks intriguing too.
j_random_hacker
+12  A: 

The easiest trick to let postgresql perform a lot better (apart from setting and using proper indexes of course) is just to give it more RAM to work with (if you have not done so already). On most default installations the value for shared_buffers is way too low (in my opinion). You can set

shared_buffers

in postgresql.conf. Divide this number by 128 to get an approximation of the amount of memory (in MB) postgres can claim. If you up it enough this will make postgresql fly. Don't forget to restart postgresql.

On Linux systems, when postgresql won't start again you will probably have hit the kernel.shmmax limit. Set it higher with

sysctl -w kernel.shmmax=xxxx

To make this persist between boots, add a kernel.shmmax entry to /etc/sysctl.conf.

A whole bunch of Postgresql tricks can be found here:

ChristopheD
+20  A: 

Since postgres is a lot more sane than MySQL, so there are not that many "tricks" to report on ;-)

The manual has some nice performance tips.

A few other performance related things to keep in mind:

  • Make sure autovacuum is turned on
  • Make sure you've gone through your postgres.conf (effective cache size, shared buffers, work mem ... lots of options there to tune).
  • Use pgpool or pgbouncer to keep your "real" database connections to a minimum
  • Learn how EXPLAIN and EXPLAIN ANALYZE works. Learn to read the output.
  • CLUSTER sorts data on disk according to an index. Can dramatically improve performance of large (mostly) read-only tables. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered.

Here's a few things I've found useful that aren't config or performance related per say.

To see what's currently happening:

select * from pg_stat_activity;

Search misc functions:

select * from pg_proc WHERE proname ~* '^pg_.*'

Find size of database:

select pg_database_size('postgres');
select pg_size_pretty(pg_database_size('postgres'));

Find size of all databases:

select datname, pg_size_pretty(pg_database_size(datname)) as size from pg_database;

Find size of tables and indexes:

select pg_size_pretty(pg_relation_size('public.customer'));

Or, to list all tables and indexes (probably easier to make a view of this):

select schemaname, relname, pg_size_pretty(pg_relation_size(schemaname || '.' || relname)) as size from (select schemaname, relname, 'table' as type from pg_stat_user_tables union all select schemaname, relname, 'index' as type from pg_stat_user_indexes )x;

Oh, and you can nest transactions, rollback partial transactions++

test=# begin;
BEGIN
test=# select count(*) from customer where name='test';
 count 
-------
     0
(1 row)
test=# insert into customer (name) values ('test');
INSERT 0 1
test=# savepoint foo;
SAVEPOINT
test=# update customer set name='john';
UPDATE 3
test=# rollback to savepoint foo;
ROLLBACK
test=# commit;
COMMIT
test=# select count(*) from customer where name='test';
 count 
-------
     1
(1 row)
Thanks. EDIT: Added information about CLUSTER.
Showing the database size is one of the features of "\l" in the 8.4 beta psql, I've noticed. Until then, I think 8.3 has a pg_size_pretty() function to prettify a size in bytes.
araqnid
Thanks for the tip! Wasn't aware of pg_size_pretty. I've updated my answer to include it.
replace(answer, 'per say', 'per se')
asjo
+7  A: 

One of the things I really like about postgre is some of the data types supported in columns. For example, there are column types made for storing Network Addresses and Arrays. The corresponding functions (Network Addresses / Arrays) for these column types let you do a lot of complex operations inside queries that you'd have to do by processing results through code in MySQL or other database engines.

Chad Birch
And you can easily create your own types if the standard ones do not fit you!
bortzmeyer
+2  A: 

pgcrypto: more cryptographic functions than many programming languages' crypto modules provide, all accessible direct from the database. It makes cryptographic stuff incredibly easy to Just Get Right.

kquinn
+9  A: 

Postgres has a very powerful datetime handling facility thanks to its INTERVAL support.

For example:

select NOW(), NOW() + '1 hour';
              now              |           ?column?            
-------------------------------+-------------------------------
 2009-04-18 01:37:49.116614+00 | 2009-04-18 02:37:49.116614+00
(1 row)

You can cast many strings to an INTERVAL type.

Yann Ramin
+4  A: 
select pg_size_pretty(200 * 1024)
Michael Buen
+2  A: 

Materialized Views are pretty easy to setup:

CREATE VIEW my_view AS SELECT id, AVG(my_col) FROM my_table GROUP BY id;
CREATE TABLE my_matview AS SELECT * FROM my_view;

That creates a new table, my_matview, with the columns and values of my_view. Triggers or a cron script can then be setup to keep the data up to date, or if you're lazy:

TRUNCATE my_matview;
INSERT INTO my_matview SELECT * FROM my_view;
+3  A: 

Arrays are really cool once you get to know 'em. Lets say you would like to store some hyper links between pages. You might start by thinking about creating a Table kinda like this:

CREATE TABLE hyper.links (
     tail INT4,
     head INT4
);

If you needed to index the tail column, and you had, say 200,000,000 links-rows (like wikipedia would give you), you would find yourself with a huge Table and a huge Index.

However, with PostgreSQL, you could use this Table format instead:

CREATE TABLE hyper.links (
     tail INT4,
     head INT4[],
     PRIMARY KEY(tail)
);

To get all heads for a link you could send a command like this (unnest() is standard since 8.4):

SELECT unnest(head) FROM hyper.links WHERE tail = $1;

This query is surprisingly fast when it is compared with the first option (unnest() is fast and the Index is way way smaller). Furthermore, your Table and Index will take up much less RAM-memory and HD-space, especially when your Arrays are so long that they are compressed to a Toast Table. Arrays are really powerful.

Note: while unnest() will generate rows out of an Array, array_agg() will aggregate rows into an Array.

Nicholas Leonard
A: 
  • Inheritance..infact Multiple Inheritance (as in parent-child "inheritance" not 1-to-1 relation inheritance which many web frameworks implement when working with postgres).

  • PostGIS (spatial extension), a wonderful add-on that offers comprehensive set of geometry functions and coordinates storage out of the box. Widely used in many open-source geo libs (e.g. OpenLayers,MapServer,Mapnik etc) and definitely way better than MySQL's spatial extensions.

  • Writing procedures in different languages e.g. C, Python,Perl etc (makes your life easir to code if you're a developer and not a db-admin).

    Also all procedures can be stored externally (as modules) and can be called or imported at runtime by specified arguments. That way you can source control the code and debug the code easily.

  • A huge and comprehensive catalogue on all objects implemented in your database (i.e. tables,constraints,indexes,etc).

    I always find it immensely helpful to run few queries and get all meta info e.g. ,constraint names and fields on which they have been implemented on, index names etc.

    For me it all becomes extremely handy when I have to load new data or do massive updates in big tables (I would automatically disable triggers and drop indexes) and then recreate them again easily after processing has finished. Someone did an excellent job of writing handful of these queries.

    http://www.alberton.info/postgresql_meta_info.html

  • Multiple schemas under one database, you can use it if your database has large number of tables, you can think of schemas as categories. All tables (regardless of it's schema) have access to all other tables and functions present in parent db.

Nakh