views:

106

answers:

3

I'd like to write a series programming lessons that guide programmers to build a certain kind of program. After each lesson, I'd like to provide sample code that implements what that lesson covered, and the next lesson would use that code as a starting point. (Edit: The repository is for my use only. Anyone working through the lessons will just download tarballs.)

Right now I'm using Git to keep track of the code from lesson to lesson. Each lesson has its own branch.

lesson1: A--B--C
                \
lesson2:         D--E--F
                        \
lesson3:                 G--H--I

However, suppose that now I want to make it easier on the Windows programmers using my lessons, so I add a Visual Studio project to lesson 1 and then merge it into lessons 2 and 3.

lesson1: A--B--C--------------J
                \              \
lesson2:         D--E--F--------K
                        \        \
lesson3:                 G--H--I--L

And then someone points out a bug in lesson 2 that causes crashes on certain systems. (This diagram is where I am right now, and I'm having doubts about continuing along this path.)

lesson1: A--B--C--------------J
                \              \
lesson2:         D--E--F--------K--M
                        \        \  \
lesson3:                 G--H--I--L--N

Here are the problems I imagine having: (Edited)

  • When I make a bunch of changes to various lessons on one computer, how do I pull all of the branches at the same time?

  • If I decide to publish these lessons, I'd like a way to tag all of the branches to correspond with what I publish. I figure I'll just need to tag each branch separately, but it would be nice if there were a better way.

  • When I look at the history, I imagine becoming terribly confused about what I've done. Compare the above diagram to a hypothetical diagram below, where I use rebase instead of merge. Please note: I would never use rebase as part of a workflow, my concern is that I'd like to be able to read the history without seeing a tangled mess.

lesson1: A--B--C--J
                   \
lesson2:            D2--E2--F2--M
                                 \
lesson3:                          G2--H2--I2

Do any of you have experience working with a project like this? Should I consider using a different VCS, such as Darcs? (Note: it would be a real pain to use centralized VCS, so don't suggest one of those unless the benefits are clear. Edit: And if it's proprietary, then it's not viable.) Should I consider writing plugins or extra tools for a VCS (such as a "meta tag" which tags several branches)?

Addendum: Right now I am thinking of two alternatives:

  • I can manage lessons the "primitive" way. Each lesson goes in its own folder, any change to lesson 1 has to be manually (or by script) applied to later lessons. For this penalty, I get the benefit of pushing, pulling, and tagging all lessons easily.

  • I can write some tools that wrap a patch-oriented VCS inside another VCS. This could be fun.

+2  A: 

I haven't used Git. I know a little about it, but not enough to compare. I've used CVS & SVN and currently working with AccuRev. Yes, these aren't distributed.

AccuRev uses stream which means that you could have a stream a the top for lesson1, then lesson2 underneath it and again lesson3 underneath lesson2.

lesson1: A
          \
lesson2:   B
            \
lesson3:     C

Any changes made in lesson1 (if promoted / committed) would be become available in lesson2 and lesson3 (when you update). Therefore, given that lesson2 and lesson3 build on their predecessors, any change will run down the stream to the end, in this case lesson2 and lesson3.

lesson1: A1--A2--A3--..--An
lesson2:     B1--B2--B3--..--Bn
lesson3:        C1--C2--C3--..--Cn

Changes can only effect predecessors if you promoted / committed past towards lesson2 or lesson1's stream. This can be prevented by placing locks between them.

On each of the streams workspaces can be added for each of the users. Here users can make changes locally and if the user needs to save their work they kan "keep" it. This just basically pushes it onto the server, but does not make it available to the parent stream.

This is just an alternative and what we use internally to separate our release and future development work.

Christo
Using a proprietary VCS is not viable.
Dietrich Epp
A: 

My advice is not to separate out lesson development into branches.

Paul Nathan
Without branches, how do I merge a changes from lesson 1 to lesson 2?
Dietrich Epp
+1  A: 

I would suggest using git submodule to manage the VS aspect:

git submodule

Don't include the visual studio project in the source code repo. Make it a separate git repo that has the source code repo as a module. This will locate it in a subdirectory, which, last time I used Visual Studio (admittedly quite a long time ago), it was okay with.

You should make a branch in the VS repo for each branch in the code repo, and then use the git submodule commands to set the submodule commit for each branch to the correct branch in the code repo.

This way you can still merge changes from earlier lessons into newer ones, and can still add files into VS that only exist in particular branches.

  • This is all a bit experimental to me too, so my apologies if I sound like I know what I'm talking about. I've worked a bit with submodules but not done anything quite this complex, specifically not somethine involving branches in an outer repo that map to branches in an inner repo.

The downside to this is that VS users will I think have to run git submodule update in addition to git checkout lesson2. Otherwise it should be pretty straightforward and will let you keep the binary VS stuff out of the code repo, which will make it easier to push through changes to code, especially ones that don't add files (since this would entail modifying the VS project).

It's still going to be a hassle if you have to change the VS repo, since it's not (AFAIK) possible to merge in changes to a .dsp or whatever the extension is these days. So if the students have made their own changes to the project file they'll have to replace it and make those same changes again. But this will keep the confusion somewhat more organized, assuming that your students are comfortable with this little extra updraft on the learning curve. For people not using VS it makes things much more straightforward and keeps the clutter out of their way.

Propagating changes

When you make changes you will have to check out each branch and merge the changes from the previous lesson. You could script this fairly easily. Assuming there were no conflicts, the whole procedure would only take a moment. EG

lessons=lesson1 lesson2 lesson3
unset prev;
function mergenext {
  while [[ $lessons ]]; do
    if [[ $prev ]]; then
      { git checkout "$lessons" && git merge "$prev"; } || return $?
    fi
    prev="$lessons"
    lessons=( "${lessons[@]:1}" );
  done;
}
mergenext
# resolve conflicts
mergenext
# ...

Or at least something like that. I didn't really review that code very thoroughly so probably use with caution. But the basic idea is that you can just call mergenext from a shell, and if there are any conflicts you can resolve them, commit, and then call mergenext again, from the same shell, and so on. If there aren't any conflicts it should just run and probably take 20 seconds to do it all.

Paired branches

An alternative to using submodule would be to create a second VS branch for each lesson branch. This is definitely easier to figure out, both for you and your students. You would end up merging in changes from the modified lesson's pure-source branch, to that lesson's VS branch, and then upstream from both of those along the pure-source and VS branch lines. So it ends up being a little more work but it's easier on the brain. The advantages over just including VS with every branch here is mainly that it's arguably easier to grok when you look at the revision history.

Rebase vs Merge

I would avoid using rebase, partly because it's not really necessary, and partly because it's going to be problematic if your students have already pulled from your repo before somebody spots a bug. I think your branch layout is sound as it is, because it lets you keep a current version of each lesson and merge changes up the line. If you use a single branch then you're basically forced to rebase.

When you rebase, you're rewriting history, which changes the SHA1 hashes that identify the commits, which means that there's no longer a common lineage with cloned repos, which means that anyone who has already cloned the repo and started making commits (ie working on the assignment) is going to have to basically build a patch of their work, re-clone the repo, and apply it again. There are I think more elegant ways to resolve this than that approach, but I haven't read that part of the manpage yet.

Also the rebase approach prevents your clever error-fixing students from just sending you a pull request for their branch. In that situation you would be able to merge in their changes and then notify the class that they should all pull from each branch in your repo to get the revised code.

Pulling updates

I'm not aware of any way to pull all of the branches of a repo at one time. This doesn't really make sense in git, since whenever you pull in changes you're merging, and generally you want to be able to intervene in case of a conflict. A script similar to the one above would be a good way to do this though. Normally in git-land you would be working on your local branch(es) and only be pulling from upstream when something new and relevant has been added.

Of course there is a myriad of possible topologies, but in general you would be very aware that you need to pull in changes. Also in your case, if the changes made to your repo conflict with something that a student has done while completing the assignment, she is going to have to manually resolve that conflict when pulling from your repo.

Tags

I'm not aware of any way to create a "tagset" with references to a set of commits; this is a very atypical use case (ie not something that people generally do when working on software projects) so I would be surprised if there is support for it.

History

Keeping the VS stuff out of the code repo, you will end up with something like

A--B--C-----------------J
       \                 \
        D--E--F-----------K
               \           \
                G--H--I-----L

which is fairly straightforward when you realize that J was a patch that fixed a bug in the original lesson. It seems a bit confusing, but this is quite similar to how a typical repository would work, in the case where a bugfix had been put into the mainline code (A-B-C) and then pulled into the topic branch (D-E-F) and from there pulled into someone's personal branch (G-H-I).

I usually use git show-branch for reviewing history, but git rev-list --graph --oneline --branches, although harder to remember, may be easier to understand. They both show you the history of commits along with the first line of the commit log for each commit. git rev-list... makes it more of a graph, similar to what gitk lesson1 lesson2 ... will show you.

intuited
I am very familiar with `git submodule`. I don't see the benefit of creating a whole new repo just to manage a single file that's supposed to track the files in a separate repo. I guess the benefit is that the Linux users don't have to even *see* the VS project? I generally use `gitk` or `GitX` to view history, but thanks for the tip with `rev-list --graph --oneline --branches`. I've updated the question a little.
Dietrich Epp
I thought it would make it easier to read the commit history by separating things into conceptual compartments. I guess I don't understand the problem with the commit history. Also I thought it was more than one file, but I guess they changed things.
intuited
Without splitting the repo (or rebasing) there's no way to avoid the VS changes being 'grafted' onto the end of the subsequent lessons, since having them feed into the beginning of those lessons would require rewriting history. In any case not involving rewriting history, any code changes will have to be applied at the head of all branches. You could check out stacked git, it might let you apply a patch across multiple branches. I'm not really sure, I haven't used it. But this workflow (or the original) seems fairly representative of 'real world' usage.
intuited