I come from a c.s. background but am now doing genomics.
My projects include a lot of Bioinformatics typically involving: aligning sequences, comparing overlap etc between sequences and various genome-annotation-features, from different classes of biological samples, time-course data, microarray, high-throughput sequencing ("next-gen" sequencing, though it's the current gen actually) data, this kind of stuff.
The workflow with this kind of analyses is quite different from what I experienced during my c.s. studies: no UML and thoughtfully designed objects shining with sublime elegance, no version management, no proper documentation (often no documentation at all), no software engineering at all.
Instead, what everyone does in this field is hacking out one Perl-script or AWK-one-liner after the other, usually for one-time usage.
I think the reason is that the input data and formats change so fast, the questions need to be answered so soon (deadlines!), that there seems to be no time for project organization.
One example to illustrate this: let's say you want to write a raytracer. You would probably put a lot of effort into the software engineering first. Then program it, finally in some highly-optimized form. Because you would use the raytracer countless of times with different input data, and would make changes to the source code over a duration of years to come. So good software engineering is paramount when coding a serious raytracer from scratch. But imagine you want to write a raytracer, where you already know that you will use it to raytrace one, single picture ever. And that picture is of a reflecting sphere over a checkered floor. In this case you would just hack it together somehow. Bioinformatics is like the latter case only.
What you end up with are whole directory trees with the same information in different formats until you have reached the one particular format necessary for the next step, and dozen of files with names like "tmp_SNP_cancer_34521_unique_IDs_not_Chimp.csv" where you don't have the slightest idea one day later why you created this file and what it exactly is.
For a while I was using mySQL which helped, but now the speed in which new data is generated and changes formats is such that it is not possible to do proper database design.
I am aware of one single publication which deals with these issues (Noble, W. S. (2009, July). A quick guide to organizing computational biology projects. PLoS Comput Biol 5 (7), e1000424+). The author sums the goal up quite nicely:
The core guiding principle is simple: Someone unfamiliar with your project should be able to look at your computer files and understand in detail what you did and why.
Well that's what I want, too! But I am following the same practices as that author already, and I feel it is absolutely insufficient.
Documenting each and every command you issue in bash, commenting it with why exactly you did it etc, is just tedious and error-prone. The steps during the workflow are just too fine-grained. Even if you do it, it can be still an extremely tedious task to figure out what each file was for, and at which point a particular workflow was interrupted, and for what reason, and where you continued.
(I am not using the word "workflow" in the sense of Taverna, by workflow I just mean the steps, commands and programs you choose to execute to reach a particular goal).
My questions is: how do you organize your Bioinformatics projects?