I'm doing some preliminary work in investigating how DVCS (the likes of Git, Hg, Bazar) can help in the process of scientific programming, especially for graduate students. I think I'm in quite a good position for this since I've been programming for quite a few years and is currently starting a Masters program in a natural science. The goal is to have a short presentation on this in a month or two.
As far as I see it, aside from the obvious advantage of source control, DVCS currently affords the following improvements to a grad student's daily life:
Branching:
This is the big one. From observing DVCS practices it is clear that cheap branching mainly encourages experimentation of new features. Scientific programming is ALL about experimentation. Different branches can be created to tweak parameters or algorithms. This is especially important because most scientific code haven't seen a single aota of refactoring throughout their lifetime (most grad students won't even know what it is), so ability to go to different branches will bring some method to the typical madness. Fast commits could also mean using commit comments as a surrogate for lab notebooks. Computational results could be tagged to specific commit hashcodes for reproducible research.
Pushing to servers:
Since most scientific code nowadays are run on some sort of a cluster, DVCS can be used as some sort of a more advanced Rsync, which many are already using to push "production" code to the HPC clusters. This is combined with branching to easily run multiple versions of code without leaving
Collaboration of papers:
Need I say more? Papers that have multiple authors are run exactly like small open source projects. Collaboration on the papers should be a natural fit when authors all write in LaTex, with additional complications if the writing is done in something like Word. This is where commit comments could potential play a bigger role.
My question is, what do you think DVCS can contribute for scientific programmers? I see a lot of talks to move to source control in the community, but most are still looking into Subversion. From my cursory notes it sounds like DVCS should be the perfect workflow paradigm for new grad students. Is my thinking flawed? Or is scientific coding simply lagging too much behind to have even heard of DVCS tools?
Related: