This is a problem I spend a fair amount of time working on. To what @VonC has already written let me add a few thoughts.
I think that the topic of software configuration management is well understood and often carefully practiced in commercial environments. However, this general approach is often lacking in scientific data processing environments many of which either remain in, or have grown out of, academia. However, if you are in such a working environment, there are readily available sources of information and advice and lots of tools to help. I won't expand on this further.
I don't think that your suggestion of including the whole source code in an executable is, even if feasible, necessary. Indeed, if you get SCM right then one of the essential tests that you have done so, and continue to do so, is your ability to rebuild 'old' executables on demand. You should also be able to determine which revision of sources were used in each executable and version. These ought to make including the source code in an executable unnecessary.
The topic of tying result sets in to computations is also, as you say, essential. Here are some of the components of the solution that we are building:
We are moving away from the traditional unstructured text file that is characteristic of the output of a lot of scientific programs towards structured files, in our case we're looking at HDF5 and XML, in which both the data of interest and the meta-data is stored. The meta-data includes the identification of the program (and version) which was used to produce the results, the identification of the input data sets, job parameters and a bunch of other stuff.
We looked at using a DBMS to store our results; we'd like to go this way but we don't have the resources to do it this year, probably not next either. But businesses use DBMSs for a variety of reasons, and one of the reasons is their ability to roll-back, to provide an audit trail, that sort of thing.
We're also looking closely at which result sets need to be stored. A nice approach would be only ever to store original data sets captured from our field sensors. Unfortunately some of our computations take 1000s of CPU-hours to produce so it is infeasible to reproduce them ab-initio on demand. However, we will be storing far fewer intermediate data sets in future than we have in the past.
We are also making it much harder (I'd like to think impossible but am not sure we are there yet) for users to edit result sets directly. Once someone does that all the provenance information in the world is wrong and useless.
Finally, if you want to read more about the topic, try Googling for 'scientific workflow' and 'data provenance' similar topics.
EDIT: It's not clear from what I wrote above, but we have modified our programs so that they contain their own identification (we use Subversion's keyword capabilities for this with an extension or two of our own) and write this into any output that they produce.