views:

59

answers:

2

I have a working WAL shipping setup with a warm standby slave server applying the WAL files.

When I create the pg_standby trigger file, it detects this at once, but it takes about 10-15 minutes to actually be ready for accepting connections. Most of the time is spent waiting for .history files.

The trigger file is empty, so a "smart" failover should be done. Can I do something to make failover (much) faster?

Log output:

WAL file not present yet. Checking for trigger file...
trigger file found: smart failover
LOG:  could not open file "pg_xlog/000000010000000000000089" (log file 0, segment 137): No such file or directory
LOG:  redo done at 0/88003428
LOG:  last completed transaction was at log time 2010-08-10 13:26:20.232799+00
Trigger file        : /psql_archive/role.master
Waiting for WAL file    : 000000010000000000000088
WAL file path       : /psql_archive/000000010000000000000088
Restoring to        : pg_xlog/RECOVERYXLOG
Sleep interval      : 60 seconds
Max wait interval   : 0 forever
Command for restore : cp "/psql_archive/000000010000000000000088" "pg_xlog/RECOVERYXLOG"
Keep archive history    : 000000000000000000000000 and later
trigger file found: smart failover
running restore     : OK

LOG:  restored log file "000000010000000000000088" from archive
Trigger file        : /psql_archive/role.master
Waiting for WAL file    : 00000002.history
WAL file path       : /psql_archive/00000002.history
Restoring to        : pg_xlog/RECOVERYHISTORY
Sleep interval      : 60 seconds
Max wait interval   : 0 forever
Command for restore : cp "/psql_archive/00000002.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore     :cp: cannot stat `/psql_archive/00000002.history': No such file or directory
cp: cannot stat `/psql_archive/00000002.history': No such file or directory
cp: cannot stat `/psql_archive/00000002.history': No such file or directory
cp: cannot stat `/psql_archive/00000002.history': No such file or directory
not restored
history file not found
LOG:  selected new timeline ID: 2
Trigger file        : /psql_archive/role.master
Waiting for WAL file    : 00000001.history
WAL file path       : /psql_archive/00000001.history
Restoring to        : pg_xlog/RECOVERYHISTORY
Sleep interval      : 60 seconds
Max wait interval   : 0 forever
Command for restore : cp "/psql_archive/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore     :cp: cannot stat `/psql_archive/00000001.history': No such file or directory
cp: cannot stat `/psql_archive/00000001.history': No such file or directory
cp: cannot stat `/psql_archive/00000001.history': No such file or directory
cp: cannot stat `/psql_archive/00000001.history': No such file or directory
not restored
history file not found
LOG:  archive recovery complete
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

Thanks.

-dennis

A: 

According to the docs: http://www.postgresql.org/docs/current/static/pgstandby.html

Fast Failover: In fast failover, the server is brought up immediately. Any WAL files in the archive that have not yet been applied will be ignored, and all transactions in those files are lost. To trigger a fast failover, create a trigger file and write the word fast into it. pg_standby can also be configured to execute a fast failover automatically if no new WAL file appears within a defined interval.

Or have a look into "Table F-23. pg_standby options" there is a maxwaittime described.

Cheers

dforce
I know about fast vs. smart failover. We are using 'smart' on purpose. But there is no activity on the master server, and nothing in the WAL archive directory to read and apply. Judging from the log it waits for a long time for files, even though 'smart' restore should just apply the WAL files available, as I understand it.At least I consider 10-15 minutes way too long when there are no WAL files to apply.
Dennis Thrysøe
A: 

In short if you don't use fast failover pg_standby will continue to process all logs that are left (as it should) to minimize data loss.

To make your life easier I would look at PITRTools.

Joshua D. Drake