views:

883

answers:

4

Hy guys, i have a postgresql 8.3 server with many database.

Actually, im planning to backup those db with a script that will store all the backup in a folder with the same name of the db, for example:

/mypath/backup/my_database1/
/mypath/backup/my_database2/
/mypath/backup/foo_database/

Every day i make 1 dump each 2 hours, overwriting the files every day... for example, in the my_database1 folder i have:

my_database1.backup-00.sql  //backup made everyday at the 00.00 AM
my_database1.backup-02.sql  //backup made everyday at the 02.00 AM
my_database1.backup-04.sql  //backup made everyday at the 04.00 AM
my_database1.backup-06.sql  //backup made everyday at the 06.00 AM
my_database1.backup-08.sql  //backup made everyday at the 08.00 AM
my_database1.backup-10.sql  //backup made everyday at the 10.00 AM
[...and so on...]

This is how i actually assure myself to be able to restore everydatabase loosing at least 2 hours of data.

2 hours still looks too much.

I've got a look to the postgresql pitr trought the WAL files, but, those files seem to contain all the data about all my database.

I'll need to separate those files, in the same way i do separate the dump files.

How to?

Otherwise, there is another easy-to-install to have a backup procedure that allo me to restore just 1 backup at 10 seconds earlier, but without creating a dump file every 10 seconds?

+1  A: 

Why do you want to separate the databases?

The way the PITR works, it is not possible to do since it works on the complete cluster. What you can do in that case is to create a data directory and a separate cluster for each of those databases (not recommended though since it will require different ports, and postmaster instances).

I believe that the benefits of using PITR instead of regular dumps outweigh having separate backups for each database, so perhaps you can re-think the reasons for why you need to separate it.

Another way could be to set up some replication with Slony-I but that would require a separate machine (or instance) that receives the data. On the other hand, that way you would have a replicated system in near real-time.

Update for comment:

To recover from mistakes, like deleting a table, PITR would be perfect since you can replay to a specific time. However, for 500 databases I understand that can be a lot of overhead. Slony-I would probably not work, since it is replicating. Not sure how it handles table deletions.

I am not aware of any other ways you can go. What I would do would probably still be going for PITR and just not do any mistakes ;). Jokes aside, depending how frequently mistakes are being made this could be a solution:

  • Set it up for PITR
  • have a second instance ready on standby.
  • When a mistake happens, replay the restore to the point in time on the second instance.
  • Do a pg_dump of the affected database from that instance.
  • Do a pg_restore on the production instance for that database.

However, it would require you to have a second instance ready, either on the same server or a different one (different is recommended). Also, the restore time would be a bit longer since it would require you to do one extra dump and restore.

Jimmy Stenke
Becose every database is indipendent.If i delete a table in the database1, and after 10 minutes i see the mistake, i want to restore just the database1, not all of them... the pg_dump mode works fine, but i cant make pg_dump every 10 seconds (i have around 500 databases...)
DaNieL
+2  A: 

It is not possible with one instance of PostgresSQL.

You can divide your 500 tables between several instances, each listening on different port, but it would mean that they will not use resources like memory effectively (memory reserved but unused in one instance can not be used by another).

Slony will also not work here, as it does not replicate DDL statements, like dropping a table.

I'd recommend doing both:

  • continue to do your pg_dump backups, but try to smooth it - throttle pg_dump io bandwith, so it will not cripple a server, and run it continuously - when it finishes with the last database then immediately start with a first one;

  • additionally setup PITR.

This way you can restore a single database fast, but you can loose some data. If you'll decide that you cannot afford to loose that much data then you can restore your PITR backup to a temporary location (with fsync=off and pg_xlog symlinked to ramdisk for speed), pg_dump affected database from there and restore it to your main database.

Tometzky
Can you please explain better your first point -throttle pg_dump io bandwith- ? i havent understood it completely
DaNieL
Use for example a throttle program (http://klicman.org/throttle/ GPLv2 license). For example: "pg_dump dbname | throttle -M 3 > dbname.sql" will limit pg_dump to 3MBps, which will not have much impact on database performance. Choose your throttling wisely - smaller limit - it will take more time, bigger - database performance will suffer more.
Tometzky
Mmmh, throttle can be really a good things..
DaNieL
Thinking about it, throttle wont help me much after all.. but, this raise another idea, there is a way to run the pg_dump in 'background' ? I mean, can i set something like 2% cpu and.. dunno, 10Mb ram, and a way that avoid pg_dump to exceed those limits? But i think the problem is not the pg_dump itslef, but postgresql.. it have to be stressed at the same way to supply pg_dump... oh, im little confused -.-
DaNieL
A: 

After googling a lot and reading your answers, seems there is no a real-life method to use pitr keeping databases indipendens.

Creating multiple postmaster's instances, db clusters, is kindly impossible (the number of database is actually 420,but will grown up (i hope that) dinamycally... without human control).

So, i'll keep dumping the dbs every 2 hours, im just sorry for not being able to give my customers a pitr guarantee ;(

Thanks guys for the replyes guys.

DaNieL
A: 

I think the way you are doing this is flawed. You should have one database with multiple schemas and roles. Then you can use PITR. However PITR is not a replacement for dumps.

This will let me to restore just 1 schema with PITR and with the dumps?
DaNieL
If i use 1 database and many schemas, will PITR allow me to restore (easly) just 1 schema at time?
DaNieL