tags:

views:

3740

answers:

21

We have a four Linux boxes (all running Debian or Ubuntu) on our office network. None of these boxes are especially critical and they're all using RAID. To date, I've therefore been doing backups of the boxes by having a cron job upload tarballs containing the contents of /etc, MySQL dumps and other such changing, non-packaged data to a box at our geographically separate hosting centre.

I've realised, however that

  • the tarballs are sufficient to rebuild from, but it's certainly not a painless process to do so (I recently tried this out as part of a hardware upgrade of one of the boxes)
  • long-term, the process isn't sustainable. Each of the boxes is currently producing a tarball of a couple of hundred MB each day, 99% of which is the same as the previous day
  • partly due to the size issue, the backup process requires more manual intervention than I want (to find whatever 5GB file is inflating the size of the tarball and kill it)
  • again due to the size issue, I'm leaving stuff out which it would be nice to include - the contents of users' home directories, for example. There's almost nothing of value there that isn't in source control (and these aren't our main dev boxes), but it would be nice to keep them anyway.
  • there must be a better way

So, my question is, how should I be doing this properly? The requirements are:

  • needs to be an offsite backup (one of the main things I'm doing here is protecting against fire/whatever)
  • should require as little manual intervention as possible (I'm lazy, and box-herding isn't my main job)
  • should continue to scale with a couple more boxes, slightly more data, etc.
  • preferably free/open source (cost isn't the issue, but especially for backups, openness seems like a good thing)
  • an option to produce some kind of DVD/Blu-Ray/whatever backup from time to time wouldn't be bad

My first thought was that this kind of incremental backup was what tar was created for - create a tar file once each month, add incrementally to it. rsync results to remote box. But others probably have better suggestions.

+8  A: 

I think you might want to look at Bacula, which 'comes in the night and sucks the essence from your computers*'. It's a fairly powerful backup tool which should be able to help you manage a set of complex backup tools.

*at least, that's what the user guide says :)

UberAlex
Bacula takes some effort to initially configure but it is extremely powerful and the only ongoing effort required once it is configured is responding to the requests to swap tapes.
stephen mulcahy
Oh, don't forget, if you're not regularly testing your backups - you might as well stop fooling yourself and torch your tapes - at least that way you know you don't have backups.
stephen mulcahy
A: 

Hi,

I'm currently using Backup2L and Rsync.net - I use it on ten Debian boxes (it's in the repository). It's self managing, so you tell it how many generations you want to keep and it prunes itself as time goes on. Mount the Rsync server via Fuse then use Backup2l to backup to the remote mount. It has pre and post hook scripts, so you can dump databases or whatever to include in the backup

Andrew

Andrew Taylor
+8  A: 

The existential answer is that there is no such thing as the best backup.

In your case I'd just use rsync to mirror all important directories onto a remote server. Something like:

rsync -av --delete src-dir remote-user@remote-host:dest-dir

This is incremental and takes less time and network bandwidth. Also, by using --delete you can remove big redundant files at any time and save space on the backup volume automatically. Of course, you wont be able to go back in time to deleted files once the backup has been updated (presumably once a day).

Rsync can be set up to use ssh so that your files' contents are transferred securely.

You might also consider sharing the user home directories from a central server through NFS or SMB. This will save some space and will allow users to switch machines or simply ssh around easily.

Avner
-1 YOUR BACKUP ROUTINE MUST NEVER DELETE FILES. If it does, it's not a backup. It's a file sync. If you accidentally delete the source, your "backup" gets toasted too. If you don't want redundant storage, run a combination of full/incremental backups. For example: run an incremental backup every day and a full backup every week. This is the same reason RAID 1 is not a backup.
Andrew
+1  A: 

I've heard many good things about bacula, and rsync is always a great option. While this is slightly off topic, it may help you to look into Puppet since it may make things easier for you and scales very well.

Sam Merrell
A: 

I recommend JungleDisk. It uses Amazon S3 to store the data offsite. They have a "workgroup edition", though I'm not sure it would meet your needs as an enterprise user.

Chris Conway
+8  A: 

I have had good success using Rsnapshot in concert with an offsite server. This "program" is a script that leverages the strengths of Rsync and the linux file system. Using hardlinked files, each backup snapshot contains your entire data set but only requires additional space for the changed files. Combining that with Rsync's ability to only send changed files and delete removed files, you get a backup system that has very low overhead for daily, weekly, monthly, or yearly backups.

The system can even be very versatile as I have used this system to backup both windows and linux boxes.

There are some negatives to this system: no inherent encryption, no compression of data on the "storage" side, etc, but most of the can be resolved with outside tools.

Rsnapshot is not a comprehensive backup tool like bacula, but it suits my needs very well.

borodimer
+1 for rsnapshot. I have this doing daily backups on a drive that has ~23Gb of space in use and the /backup partition is only sitting at ~29Gb of space. That is with 7 daily, 4 weekly and 12 monthly backups.
SiegeX
A: 

I figured I should give it a few hours before accepting an answer, since in this case there's no right or wrong answer, just answers that seem better or worse. It's now 12 hours since I asked the question. I'll be looking at Bacula in the first instance. If that doesn't work out for me, I'll proceed to the other suggestions.

Jon Bright
+1  A: 

Rsync is perhaps your simplest option here. I'm sure there are many other ways to do your backups, but rsync is one of the easiest to implement and keep working. If you're worried about the security of your backups crossing the Internet, then tunnel it through SSH. SSH+rsync should cover your first three requirements (open source, easy to maintain, offsite). For the automatic burning of backups onto optical media, you could look into a cron job to burn the disks.

Andrew
A: 

I have recently implemented backuppc which is a free/open source solution. Can use rsyncd to do the backups and works in heterogeneous situations where you want to backup Linux/Unix and windows servers, can do incrementals. Has a web interface so you can configure and remote restore. I implemented backuppc over openvpn to the backup server at my home. Then you may want to have a tape drive on your home server.

Brian G
+1  A: 

Some very nice, simple solutions are built on top of rsync. I'm exploring both duplicity and rdiff-backup.

Donnie
+4  A: 

I can highly recommend duplicity to do backups to Amazon S3. It does encryption (gpg) and incremental backups, and the restore process is frightfully easy.

pjz
Also make sure to check out Ftplicity which is built on top of duplicity. Very nice indeed. I use it to make offline backups for a small compoany.
Subtwo
Actually, S3 is one of the several targets that Duplicity can handle. You can also backup your data to a distant filesystem via ssh/scp, a webdav repository, ...I like duplicity for the ease of use and the fact it handles incremental and full backup schemes.
Nicolas
+3  A: 

I'm very happy with BackupPC. Then you can sync the backup directory to S3 or EBS.

Scott
A: 

As the .rpm docs say, fwbackups is feature-rich. I just started using it for incremental workstation and server backups and I love it. It has many options such as tarball versus rsync, local versus SSH.

Another good one is pybackback. It specifically supports CD/DVD backups so maybe it would have benefits for deh BluRayz etc.

Ali Nabavi
A: 

Just poking fun:

cron

+

cp -ua

Andrew
+1  A: 

I've also heard great things about Bacula from folks I trust and respect who are running large, successful, hands-off backup installations. However, given:

  • you only have 4 boxes to backup
  • they're all Debian or Ubuntu
  • you already have jobs in place dumping MySQL database (and hopefully other transactional systems)

I would actually recommend BackupPC. Here are some points to consider:

  • I recently set up backups for home and work using BackupPC. About 4 boxes at home and 5 at work. I'm backing up Fedora, Ubuntu, and Windows machines. Works great.
  • I haven't tried Bacula, but I've heard BackupPC has fewer features than Bacula.
  • I believe the server used to host BackupPC must be a UNIX-like system. I'm using a Fedora server at home for BackupPC and an Ubuntu server at work. It is included in the standard package repositories for both operating systems.
  • BackupPC has some limitations that come into play when backing up Windows machines. If you use ssh, use rsyncd over an ssh tunnel, not rsync over ssh.
  • BackupPC is very space-efficient... instead of storing identical files from multiple machines, hardlinks are used.
  • BackupPC relies heavily on rsync but hides all of its complexity and comes packaged with sensible defaults.
  • Since I always use rsync, most of my daily incremental backups finish in less than a minute.
  • BackupPC is very well documented.
  • Archiving to DVD/CD/tape/external drives is very easy.
  • BackupPC has a simple, intuitive Web interface. It's easy to allow people to retrieve their own backups or exclude directories.
  • BackupPC does not require any programs besides SSH to be running on the client (although something like tar or rsync must be present). I believe Bacula has a client-side program that must always be running (the "Client program" or "File service"). This is not necessarily better or worse, just different.
  • Knowledge of SSH public key authentication is required if using rsync over ssh for backups.

Good luck, and please let us know what you end up using and why!

Adam Monsen
Is this thread more appropriate for serverfault? Just curious.
Adam Monsen
A: 

Here's a good service that's really cheap. Only 2.99 a month for unlimited storage. Works with any SFTP client www.datastorageunit.com

A: 

My company has 4 servers. Duplicity is simple. Like maybe an hour to setup (backup and restore scripts). When files don't change, the volumes don't change size.

Backs up to Amazon S3, so you can't be more secure nor redundant than that. Oh, and its free, open source, and built into Ubuntu.

Kearney
A: 

Have you tried Ahsay Backup? We're just trialling it at work, not free but makes it much easier.

LeeNukes
A: 

Christ. Do the astroturfers not even understand the forum they are on ? @John : I hardly think stackoverflow is the target market for the "unlimited" flat rate provider you are "recommending".

Moving on, I will second the previous recommendation for duplicity - it's very stable and has solid backing from a long-time maintainer. The mailing list is well trafficked and a lot of folks are using it, and have been using it.

I will also second the recommendation for rsync.net. They had me at hello with their warrant canary, but the ability to run actual unix commands over ssh with my filesystem as the target (ssh user@host md5 some/file, for instance) is what has kept me loyal for years now.

IrixUser
A: 

See also this question on Server Fault.

Juha Syrjälä
A: 

I know this thread is old but wanted to suggest R1Soft's CDP system. Perhaps not suited for everyone's needs but in our IT management practice this has worked very well with nearly 200 servers being backed up daily/hourly.

Two key features besides good pricing on the enterprise editions:

  • Per table restores for InnoDB tables in MySQL
  • Bare metal recovery via PXE boot image

Pxe boot a box into their recovery image, config your network settings and a few mouse clicks later the system is restored.

This also works for Citrix XenServer images.

The CDP Server has email alerts for failed attempts and we've use Nagios to monitor the agents. All in and all it has been very reliable for us.

jeffatrackaid