Version control for reports (git)

views:

answers:

+1 Q:

Version control for reports (git)

I have a particular report that I am asked to run from time to time. The details are slightly different each time - different date ranges, different selection criteria - but structurally, the report is fairly stable. I do make some structural changes from time to time, however.

I have two hopes for these reports:

1) to be able to reproduce any report at a later date. 2) to be able to review the structural changes made to the report over time.

Right now, I just have a folder with a master script, which I modify for every iteration of the report, and subfolders where I save a snapshot of the master script and the data for each run.

Maybe that's good enough. But I've started using git to manage my (much more complex) data analysis scripts, and I was wondering if there was a way to use it here (and for myriad similar reports) that would allow for more robust version control.

I can think of a few different ways to do so: make a branch for each report, but only merge structural changes back onto the master; clone the master into the subfolder for a new report, make changes there, push back structural changes; etc. But I really don't even know enough to be able to separate insane ideas from plausible ones, much less good ones. Let me know what you think. Thanks.

+1 A:

I'd personally go for your first suggestion:

make a branch for each report, but only merge structural changes back onto the master

This is by far the easiest conceptually, and it by merging the structural changes into the head revision, you can apply them as and when required to the other branches (when requested). The only downside is the amount of branches you'll leave lying around, it sounds like an infrequent request and a good naming scheme should sort that out.

cristobalito 2010-08-06 23:01:54

+1 A:

I have a particular report that I am asked to run from time to time. The details are slightly different each time - different date ranges, different selection criteria - but structurally, the report is fairly stable.

If you can anticipate which fields change each time, I would say make a generic report that prompts you for this data each time the report is run. You should be able to do this in just about any reporting software. The report itself can be tracked in git, and you won't have to worry about having 50,000 branches in your repository.

If it's unpredictable what fields need to be custom each time, give most of the fields useful default values.

If you run this report a lot, and are specifically interested in keeping track of the various result sets, I'd suggest a different approach. I don't know what your report generates, but let's say it's a PDF. I would make a directory structure somewhere, and you could store each run in results/year/month/date.pdf. This way you will have a record of the data pulled on May 5, 2010 (or with May 5, 2010 as a parameter).

Edit: You might consider tags instead of branches for those things you can't combine into a single report. If you have a version you think you're going to need quick access to, tag it. Any time you need to get back to it, just check out the tag and run the report.

haydenmuhl 2010-08-09 04:20:53

As I mention in my comments to Fabio, part of the problem is that I'm not *just* changing dates - I'm also adding lines of manual data correction (corrections that will not and should not be fed back to the actual database), and I'd like for these to be easy to review in the future. It's those judgment-based changes that I'm interested in preserving, rather than the ones that change systematically.

Matt Parker 2010-08-12 18:10:31

Also, I already do save copies of each report's dataset and output; I'm curious about ways to do the same for the script itself without ending up with a version for each run of the report. I'll inevitably end up with N branches or files or commits or whatever; it's just a question of what will be most manageable.

Matt Parker 2010-08-12 18:12:02

Thanks for your answer, though - it definitely helped me think more clearly about what I'm asking, and reminded me that at least some of the problem could be handled by using SAS's macro variables, which was long overdue.

Matt Parker 2010-08-12 18:13:36

+4 A:

It depends on the report obviously and how it would change but following what you say it does seem to me you can write a good and meaningful SAS Macro program that can have as parameters all your selection criteria. In the SAS macro code you can then evaluate the parameters and make the structural change, if necessary.

So one .sas file with just one big macro in it, depending on the parameters you use to call the macro it can reproduce all the reports you want.

This makes sense to you? If it doesn't let me know and i could provide some examples of SAS Macro to get you started if you are not familiar with it.

Fabio Prevedelli

Fabio Prevedelli 2010-08-09 06:07:41

... you know, I really have no idea why this didn't occur to me. That's the danger of inheriting code and not really looking at it critically, I suppose.

Matt Parker 2010-08-12 17:12:57

However, after revisiting this (which I hadn't done in awhile), I remembered part of my original motivation for asking this question: there are data-cleaning processes that can't be automated and that require manual changes to be made to the data. I'd like to be sure that each of these changes is documented somehow - e.g., it would be preserved if in a branch or in a tagged commit.

Matt Parker 2010-08-12 18:02:44

Nevertheless, thanks for your good answer - took only a minute to convert to macro variables and it's much cleaner.

Matt Parker 2010-08-12 18:03:11

ansaurus

tags:

views:

answers:

Version control for reports (git)

related questions