views:

138

answers:

8

Our shop is constantly running out of disk space, because we have a mandate from the developers and management to keep all of the nightly builds, as it greatly aids debugging.

Each build generally has few changes. However, almost every file is different from one build to the next, because the compiler and linker insert date/time stamps (and versions) in the objects and executables.

So the question is: does anyone know of a file system that not only can "merge duplicate files", but can also "merge similar files", since these files are several KB to MB, but only a few bytes per file have changed? (Even with code changes, the impact on the executables will generally be small, unless a major header or library is changed, etc.)

+2  A: 

Why not a traditional version control system?

You begin you the first build ever and then commit each other build in succession on top of that. If a file is a duplicate, no delta will be created. If it has modification, it will store only the change.

Ain't that what you asked for?

Martinho Fernandes
This is for build results storage, not source code; we do use version control for our source.
Version control can be used for binaries as well. And it provides the features you asked (deltas).
Martinho Fernandes
Okay, that makes sense, but -- is there a VCS that acts as a filesystem? That can be accessed from Windows? (Preferably, a Windows filesystem.)
@Dude: Now I understand exactly what you want. I now that there are some filesystems out there with similar features but not with Windows support... sorry.
Martinho Fernandes
A: 

I think cramfs does this, but creating cramfs images (which are read-only) may be a headache.

Hmm.. Why would you need to archive builds? Shouldn't it be trivial to recreate a build from a previous source-controlled version?

TokenMacGuy
Yes, it's trivial to re-create builds but it is both time-consuming, and does not produce a byte-for-byte equivalent (due to the link date/timestamps). One of the debugging techniques is "binary search" to determine the build an issue first showed up in, and having all old builds makes that easier.
+1  A: 

Generally my opinion on these type of topics is "don't try to code your way out of a poor management decision".

If they want to keep that many old build files, "man up" and buy the space (disk space is cheap).

If they don't want to buy the disks, "man up" and pick a reasonable number of builds to keep that fits within the available space.

It isn't really that hard of a problem.

The farthest I would go down this path would be to enable "disk compression" on the drive if it's available, since that is handled by the OS and doesn't need an additional app and the support that goes with it. Sure, it's a small performance hit - but it's simple.

Save the "development" juice for the important things. :-)

Ron

Ron Savage
I agree it's not that hard of a problem. As I said in another comment, we're trying to "do more with less" so we can all keep our jobs...Disk compression won't have the same impact as an algorithm that said "these two files have the same name, but a few different bytes" and stored deltas.
A: 

You most likely need a filesystem or volume manager with transparent compression. Performance would naturally be hurt, but if you use light compression it shouldn't be too much of a problem.

Eduard - Gabriel Munteanu
A: 

For a fraction of the effort of installing and maintaining a new file management scheme, I would buy more hard drives. I was at Frys last night and they were selling a 1 TB external drive for $100; it's a USB drive, all you need to do is plug it in. If it gets filled up, just buy another one. Speaking as a developer, I'd be surprised if your shop really wants to hold on to the builds indefinitely -- I suspect the value of the nightly builds would be low after some number of months.

This is not an elegant solution but it may be more cost effective.

jdigital
We use server-class hardware and budgets are being affected by the economic downturn. The idea is to be able to "do more with less" so there's more money left for our salaries.
Do you really need "server-class" hardware for this problem? $100 is approximately 2 hours of your gross salary.
George V. Reilly
+1  A: 

This is a poor man's solution, but if build storage is a major financial decision, then I think this answer is warranted. :)

First, it is highly unlikely that you will find a brand new storage/source control system that does this type of complex file referencing for less than you could just pay for new storage.

How about keeping the last 2 months worth of builds on expensive storage (i.e. RAID) and then go get yourself some cheap storage for archiving (i.e. a 1TB USB drive that someone else mentioned).

Write a simple little console app that runs as a scheduled task every night and moves all files < 2 months ago to the archive drive.

routeNpingme
Complex? Why? You simply commit each build on top of the other and the VCS takes cares of the delta stuff...
Martinho Fernandes
A: 

I think you should use an application to delete duplicate files that will be a better solution for you, i use a software to delete duplicate files called duplicate finder 2009.

A: 

Opensolaris and ZFS.

http://blogs.sun.com/bonwick/entry/zfs%5Fdedup

NeoMinder