views:

604

answers:

17

I'm curious about keeping source code around reliably and securely for several years. From my research/experience:

  1. Optical media, such as burned DVD-R's lose bits of data over time. After a couple years, I don't get all the files off that I put on them. Read errors, etc.

  2. Hard drives are mechanical and subject to failure/obsolescence with expensive data recovery fees, that hardly keep your data private (you send it away to some company).

  3. Magnetic tape storage: see #2.

  4. Online storage is subject to the whim of some data storage center, the security or lack of security there, and the possibility that the company folds, etc. Plus it's expensive, and you can't guarantee that they aren't peeking in.

I've found over time that I've lost source code to old projects I've done due to these problems. Are there any other solutions?

Summary of answers:
1. Use multiple methods for redundancy.
2. Print out your source code either as text or barcode.
3. RAID arrays are better for local storage.
4. Open sourcing your project will make it last forever.
5. Encryption is the answer to security.
6. Magnetic tape storage is durable.
7. Distributed/guaranteed online storage is cheap and reliable.
8. Use source control to maintain history, and backup the repo.

+3  A: 

Any data you want to keep should be stored in multiple places on multiple formats. While the odds of any one failing may be significant, the odds of all of them failing are pretty small.

Chris Upchurch
+3  A: 

If you want to archive something for a long time, I would go with a tape drive. They may not hold a whole lot, but they are reliable and pretty much the storage medium of choice for data archiving. I've never personally experienced dataloss on a tape drive, however.

Alex Fort
It would be useful to quantify your usage of tapes. Handling 40,000 tapes across 10 years versus 6 tapes in 1 year aren't quite the same scenario.
icelava
+6  A: 

Based on your level of paranoia, I'd recommend a printer and a safe.

More seriously, a RAID array isn't so expensive anymore, and so long as you continue to use and monitor it, a properly set-up array is virtually guaranteed never to lose data.

deemer
I think RAID6 is the way to go for ultra availability:http://en.wikipedia.org/wiki/RAID_6#RAID_6
Bob King
Just remember to replace the disks every few years. To be on the safe side - don't use them over their 3 year warranty.
skolima
RAID is for improving speed and/or disk space. RAID is *not* backup!
Flint
*and* for improving reliability.
David Heggie
Flint, that depends on which RAID configuration one is using. Take a look at RAID level 1.
icelava
+6  A: 

The best answer is "in multiple places". If I were concerned about keeping my source code for as long as possible I would do:

1) Backup to some optical media on a regular basis, say burn it to DVD once a month and archive it offsite.

2) Back it up to multiple hard drives on my local machines

3) Back it up to Amazon's S3 service. They have guarantees, it's a distributed system so no single points of failure and you can easily encrypt your data so they can't "peek" at it.

With those three steps your chances of losing data are effectively zero. There is no such thing as too many backups for VERY important data.

Frank Wiles, Revolution Systems, www.revsys.com

Frank Wiles
+3  A: 

The best way to back up your projects is to make them open source and famous. That way there will always be people with a copy of it and able to send it to you.

After that, just care of the magnetic/optical media, continued renewal of it and multiple copies (online as well, remember you can encrypt it) on multiple media (including, why not, RAID sets)

Vinko Vrsalovic
A: 

One way would be to periodically recycle your storage media, i.e. read data off the decaying medium and write it to a fresh one. There exist programs to assist you with this, e.g. dvdisaster. In the end, nothing lasts forever. Just pick the least annoying solution.

As for #2: you can store data in encrypted form to prevent data recovery experts from making sense of it.

Jan Krüger
A: 

I think Option 2 works well enough if you have the write backup mechanisms in place. They need not be expensive ones involving a third-party, either (except for disaster recovery). A RAID 5 configured server would do the trick. If a hard drive fails, replace it. It is HIGHLY unlikely that all the hard drives will fail at the same time. Even a mirrored RAID 1 drive would be good enough in some cases.

If option 2 still seems like a crappy solution, the only other thing I can think of is to print out hard-copies of the source code, which has many more problems than any of the above solutions.

Gilligan
A: 

Online storage is subject to the whim of some data storage center, the security or lack of security there, and the possibility that the company folds, etc. Plus it's expensive,

Not necessarily expensive (see rsync.net for example), nor insecure. You can certainly encrypt your stuff too.

and you can't guarantee that they aren't peeking in.

True, but there's probably much more interesting stuff to peek at than your source-code. ;-)

More seriously, a RAID array isn't so expensive anymore

RAID is not backup.

Flint
+1  A: 

The best home-usable solution I've seen was printing out the backups using a 2D barcode - the data density was fairly high, it could be re-scanned fairly easily (presuming a sheet-feeding scanner), and it moved the problem from the digital domain back into the physical one - which is fairly easily met by something like a safe deposit box, or a company like Iron Mountain.

The other answer is 'all of the above'. Redundancy always helps.

pjz
I can't tell if this reply is serious or not.Fading inks, yellowing and brittling of paper, easy flammability, potential to be ruined by anything wet… a paper-based solution doesn't sound like a good back-up to me at all.
Garrett Albright
Acid-free paper, silly. We have paper documents hundreds and thousands of years old. Magnetic and optical storage is good for, say, 10 years or so.
postfuturist
+1  A: 

For my projects, I use a combination of 1, 2, & 4. If it's really important data, you need to have multiple copies in multiple places. My important data is replicated to 3-4 locations every night.

If you want a simpler solution, I recommend you get an online storage account from a well known provider which has an insured reliability guarantee. If you are worried about security, only upload data inside TrueCrypt encrypted archives. As far as cost, it will probably be pricey... But if it's really that important the cost is nothing.

JMack
+3  A: 

I think you'd be surprised how reasonably priced online storage is these days. Amazon S3 (simple storage solution) is $0.10 per gigabyte per month, with upload costs of $0.10 per GB and download costing $0.17 per GB maximum.

Therefore, if you stored 20GB for a month, uploaded 20GB and downloaded 20GB it would cost you $8.40 (slightly more expensive in the European data center at $9).

That's cheap enough to store your data in both US and EU data centers AND on dvd - the chances of losing all three are slim, to say the least.

There are also front-ends available, such as JungleDisk.

http://aws.amazon.com
http://www.jungledisk.com/
http://www.google.co.uk/search?q=amazon%20s3%20clients

adam
+1  A: 

For regulatory mandated archival of electronic data, we keep the data on a RAID and on backup tapes in two separate locations (one of which is Iron Mountain). We also replace the tapes and RAID every few years.

+1  A: 

If you need to keep it "forever" probably the safest way is to print out the code and stick that in a plastic envelope to keep it safe from the elements. I can't tell you how much code I've lost to a backup means which are no longer reachable.... I don't have a paper card reader to read my old cobol deck, no drive for my 5 1/4" floppies, or my 3 1/2" floppies. but yet the print out that I made of my first big project still sits readable...even after my once 3 year old decided that it would make a good coloring book.

skamradt
why are you not copying valuable stuff onto modern, contemporary medium?
icelava
Current stuff of course is copied to modern medium. But many projects which have long since closed have unfortunately been lost to technology progress.
skamradt
A: 

I was just talking with a guy who is an expert in microfilm. While it is an old technology, for long term storage it is one of the most enduring forms of data storage if properly maintained. It doesn't require sophisticated equipment (magifying lens and a light) to read altough storing it may take some work.

Then again, as was previously mentioned, if you are only talking in the spans of a few years instead of decades printing it off to paper and storing it in a controlled environment is probable the best way. If you want to get really creative you could laminate every sheet!

Tim
+1  A: 

When you state "back up source code", I hope you include in your meaning the backing up of your version control system too.

Backing your current source code (to multiple places) is definitely critical, but backing up your history of changes as preseved by your VCS is paramount in my opinion. It may seem trivial especially when we are always "living in the present, looking towards the future". However, there have been way too many times when we have wanted to look backward to investigate an issue, review the chain of changes, see who did what, whether we can rollback to a previous build/version. All the more important if you practise heavy branching and merging. Archiving a single trunk will not do.

Your version control system may come with documentation and suggestions on backup strategies.

icelava
+2  A: 

Don't forget to use Subversion (http://subversion.tigris.org/). I subversion my whole life (it's awesome).

Sleep Deprivation Ninja
A: 

Drobo for local backup

DVD for short-term local archiving

Amazon S3 for off-site,long-term archiving