views:

186

answers:

4

Hi there,

I feel like this is quite delicate,

I have various folders whith projects I would like to backup into a zip/tar file, but would like to avoid backing up files such as pyc files and temporary files.

I also have a Postgres db I need to backup.


Any tips for running this operation as a python script?

Also, would there be anyway to stop the process from hogging resources in the process?


Help would be very much appreciated.

+3  A: 

If you're on Linux (or any other form of Unix, such as MacOSX), a simple way to reduce a process's priority -- and therefore, indirectly, its consumption of CPU if other processes want some -- is the nice command. In Python (same OSs), os.nice lets your program "make itself nicer" (reduce priority &c).

For backing up a PostgreSQL DB, I recommend PostgreSQL's own tools; for zipping up a folder except the pyc files (and temporary files -- however it is you identify those), Python is quite suitable. For example:

>>> os.chdir('/tmp/az')
>>> f = open('/tmp/a.zip', 'wb')
>>> z = zipfile.ZipFile(f, 'w')
>>> for root, dirs, files in os.walk('.'):
...   for fn in files:
...     if fn.endswith('.pyc'): continue
...     fp = os.path.join(root, fn)
...     z.write(fp)
... 
>>> z.close()
>>> f.close()
>>> 

this zips all files in said subtree except those ending in .pyc (without compression -- if you want compression, add a third argument zipfile.ZIP_DEFLATED to the zipfile.ZipFile call). Could hardly be easier.

Alex Martelli
+2  A: 

On linux, you can use tar with --exclude option. an example, to exclude your .pyc files and temp files (in this example, .tmp)

$ tar zcvf backup.tar.gz --exclude "*.tmp" --exclude "*.pyc"

use the z option to zip it up as well.

ghostdog74
+1  A: 

With today's multicore cpus, you may find that cpu is not the bottle neck. It is now far more likely to the the disk I/O that needs to be shared better.

Linux has the ionice command to allow you to control this

ionice(1)

NAME

   ionice - get/set program io scheduling class and priority

SYNOPSIS

   ionice [[-c class] [-n classdata ] [-t]] -p PID [PID ...]

   ionice [-c class] [-n classdata ] [-t] COMMAND [ARG ...]

DESCRIPTION
This program sets or gets the io scheduling class and priority for a program. If no arguments or just -p is given, ionice will query the current io scheduling class and priority for that process.

gnibbler
+1  A: 

Backup is at least as much about the importance of recovery using whatever backup you make.

The right way to back up source code is to keep source files in a VCS (version control system), and back up the VCS repository. Exclude any auto-generated easily-replaced files (like those *.pyc files, etc.) from the VCS repository. I recommend Bazaar for very efficient storage and user-friendliness, but your team will likely already have a VCS they prefer.

For backup of a PostgreSQL database, it's best to use pg_dump to regularly dump the database to a text file, compress that, and back up the result. This is because the backup then becomes restorable on any machine, by re-playing the database dump into another PostgreSQL server.

As for how to automate it: you would be best using a Bash program for the purpose, since it's just a matter of connecting some commands to files, which is what the shell excels at.

bignose