views:

555

answers:

10

Assume that I set up an automatic nightly build. What artifacts of the build should I save?

For example:

  • Input source code
  • output binaries

Also, how long should I save them, and where?

Do your answers change if I do Continuous Integration?

+3  A: 

We save the binaries, stripped and unstripped (so we have the exactly same binary, once with and once without debug symbols). Further we build everything twice, once with debug output enabled and once without (again, stripped and unstripped, so every build result in 4 binaries). The build is stored to a directory according to SVN revision number. That way we can always retain the source from the SVN repository by simply checking out this very revision (that way the source is archived as well).

Mecki
+17  A: 

You shouldn't save anything for the sake of saving it. you should save it because you need it (i.e., QA uses nightly builds to test). At which point, "how long to save it" becomes however long QA wants them.

i wouldn't "save" source code so much as tag/label it. I don't know what source control you're using, but tagging is trivial (performance & disk space) for any quality source control system. Once your build is tagged, unless you need binaries, there really isn't any benefit to just having them around because you can simply re-compile when necessary from source.

Most CI tools let you tag on each successful build. This can become problematic for some systems as you can easily have 100+ tags a day. For such cases I recommend still running a nightly build and only tagging that.

Karl Seguin
A: 

Save as in check in to source code control or just on disk? Save nothing to source code control. All derived files should be visible in the file system and available to developers. Don't checkin binaries, code generated from XML files, message digests etc. A separate packaging step will make these end products available. As you have the change number you can always reproduce the build if necessary assuming of course everything you need to do a build is completely in the tree and is available to all builds by syncing.

Todd Hoff
I used to think this too, but after working for a client that did keep all binaries and other generated artifacts in the version control system, I recommend putting it all in version control system. It's really easy to grab *everything* for a particular release and debug it with minimum fuss.
Kristopher Johnson
+4  A: 

In addition to the binaries as everyone else has mentioned I would recomend setting up a symbol server and a source server and making sure you get the correct information out and into those. It will aid in debugging tremendously.

Alex
+5  A: 

This isn't a direct answer to your question, but don't forget to version control the nightly build setup itself. When the project structure changes, you may have to change the build process, which will break older builds from that point on.

A: 

I would save your built binaries for exactly as long as they have a chance to go into production or be used by some other team (like a QA group). Once something has left production, what you do with it can vary a lot. For a lot of teams, they'll keep just their most recent prior build around (for rollback) and otherwise discard their builds.

Others have regulatory requirements to keep anything that went into production around for as long as seven years (banks). If you are a product company, I'd keep around any binary a customer might have installed in case a tech support guy wants to install the same version.

EricMinick
+3  A: 

A surprising one I learned about recently: If you're in an environment that might be audited you'll want to save all the output of your build, the script output, the compiler output, etc.

That's the only way you can verify your compiler settings, build steps, etc.

Also, how long to save them for, and where to save them?

Save them until you know that build won't be going to production, iow as long as you have the compiled bits around.

One logical place to save them is your SCM system. Another option is to use a tool that will automatically save them for you, like AnthillPro and its ilk.

Jeffrey Fredrick
+5  A: 

Here are some artifacts/information that I'm used to keep at each build:

  • The tag name of the snapshot you are building (tag and do a clean checkout before you build)
  • The build scripts themselfs or their version number (if you treat them as a separate project with its own version control)
  • The output of the build script: logs and final product
  • A snapshot of your environment:
    • compiler version
    • build tool version
    • libraries and dll/libs versions
    • database version (client & server)
    • ide version
    • script interpreter version
    • OS version
    • source control version (client and server)
    • versions of other tools used in the process and everything else that might influence the content of your build products. I usually do this with a script that queries all this information and logs it to a text file that should be stored with the other build artifacts.

Ask yourself this question: "if something destroys entirely my build/development environment what information would I need to create a new one so I can redo my build #6547 and end up with the exact same result I got the first time?"

Your answer is what you should keep at each build and it will be a subset or superset of the things I already mentioned.

You can store everything in your SCM (I'd recommend a separate repository), but in this case your question on how long you should keep the items looses sense. Or you should store it to zipped folders or burn a cd/dvd with the build result and artifacts. Whatever you choose, have a backup copy.

You should store them as long as you might need them. How long, will depend on your development team pace and your release cycle.

And no, I don't think it changes if you do continous integration.

+1  A: 

We're doing something close to "embedded" development here, and I can tell you what we save:

  • the SVN revision number and timestamp, as well as the machine it was built on and by whom (also burned into the build binaries)
  • a full build log, showing whether it was a full/incremental build, any interesting (STDERR) output the data baking tools produced, a list of files compiled and any compiler warnings (this compresses very well, being text)
  • the actual binaries (for anywhere from 1-8 build configurations)
  • files produced as a side effect of linking: a linker command file, address map, and a sort of "manifest" file indicating what was burned into the final binaries (CRC and size for each), as well as the debugging database (.pdb equivalent)

We also mail out the result of running some tools over the "side-effect" files to interested users. We don't actually archive these since we can reproduce them later, but these reports include:

  • total and delta of filesystem size, broken down by file type and/or directory
  • total and delta of code section sizes (.text, .data, .rodata, .bss, .sinit, etc)

When we have unit tests or functional tests (e.g. smoke tests) running, those results show up in the build log.

We've not thrown out anything yet -- given, our target builds usually end up at ~16 or 32 MiB per configuration, and they're fairly compressible.

We do keep uncompressed copies of the binaries around for 1 week for ease of access; after that we keep only the lightly compressed version. About once a month we have a script that extracts each .zip that the build process produces and 7-zips a whole month of build outputs together (which takes advantage of only having small differences per build).

An average day might have a dozen or two builds per project... The buildserver wakes up about every 5 minutes to check for relevant differences and builds. A full .7z on a large very active project for one month might be 7-10GiB, but it's certainly affordable.

For the most part, we've been able to diagnose everything this way. Occasionally there's a hiccup on the buildsystem and a file isn't actually a the revision it's supposed to be when a build happens, but there's usually enough evidence of this in the logs. Sometimes we have to dig out a tool that understands the debugging database format and feed it a few addresses to diagnose a crash (we have automatic stackdumps built into the product). But usually all the information needed is there.

We haven't had to crack the .7z archives yet, to mention. But we have the info there, and I have some interesting ideas on how to mine bits of useful data from it.

leander
I liked your answer; thanks for writing it.
Jay Bazuzi
+1  A: 

Save what can't be reproduced easily. I work on FPGAs where only the FPGA team have the tools and some cores (libraries) of the design are licensed to compile on only one machine. So we save the output bitstreams. But try to check them over one another rather than with a date/time/version stamp.

Brian Carlton
I had to look it up. FPGA = Field Programmable Gate Array. http://en.wikipedia.org/wiki/FPGA
Jay Bazuzi