views:

2251

answers:

7

My question is about the way in which GIT handles branches: whenever you branch from a commit, this branch won’t ever receive changes from the parent branch unless you force it with a merge.

But in other systems such us Clearcase or Accurev, you can specify how branches get filled with some sort of inheritance mechanism: I mean, with Clearcase, using a config_spec, you can say “get all the files modified on branch /main/issue001 and then continue with the ones on /main or with this specific baseline”.

In Accurev you also have a similar mechanism which let’s streams receive changes from upper branches (streams how they call them) without merging or creating a new commit on the branch.

Don’t you miss this while using GIT? Can you enumerate scenarios where this inheritance is a must?

Thanks

Update Please read VonC answer below to actually focus my question. Once we agree "linear storage" and DAG based SCMs have different capabilities, my question is: which are the real life scenarios (especially for companies more than OSS) where linear can do things not possible for DAG? Are they worth?

+1  A: 

It sounds like what you're looking for might be git rebase. Rebasing a branch conceptually detaches it from its original branch point and reattaches it at some other point. (In reality, the rebase is implemented by applying each patch of the branch in sequence to the new branch point, creating a new set of patches.) In your example, you can rebase a branch to the current tip of an upper branch, which will essentially "inherit" all the changes made to the other branch.

Greg Hewgill
Not sure git rebase is the answer. It will create a new commit and, AFAIK, it will even modify branch history before rebasing.The point is Clearcase or Accurev don't create a new commit for that.With Clearcase you can configure your "view" to "download" stuff from different branches, something you can't with GIT, but my question is: is it important? Maybe git rebase is enough, although it doesn't do the same thing.
jbhope
+2  A: 

I'm not totally clear on what your asking for but it sounds like git's tracking semantics are what you want. When you branch from am origin you can do something like:

git -t -b my_branch origin/master

And then future "git pull"s will auto merge origin/master into your working branch. You can then use "git cherry -v origin/master" to see what the difference is. You can use "git rebase" before you publish your changes to clean up the history, but you shouldn't use rebase once your history is public (i.e. other people are following that branch).

stsquad
Exactly, git rebase can do it, as Greg pointed, but you always create a new commit, don't you? With Clearcase you don't need to, but I'd like to know whether you *loose* something or not. I don't see a clear scenario where "true inheritance" makes a big difference, and I'm trying to find one.
jbhope
Well nothing is truely lost as the reflog will show you but yes git rebase re-writes history so you will lose the information about where you came from. However nothing says you have to use rebase, if you just merge then all your history is maintained. It depends if keeping the history of all those typo fixes and interim steps is important to you.
stsquad
A: 

I'm not sure if you are asking anything, but you are demonstrating that Accurev streams are different tools than Git (or SVN) branches. (I don't know Clearcase.)

For example, with Accurev you are forced, as you say, to use certain workflows, which gives you an auditable history of changes that is not supported in Git. Accurev's inheritance makes certain workflows more efficient and others impossible.

With Git you can have exploratory coding segregated in local repos or in feature branches, which would not be supported very well by Accurev.

Different tools are good for different purposes; it's useful to ask what each one is good for.

Paul
Paul,No, I'm not trying to demonstrate but actually asking. I mean, could you exactly explain which workflows are only doable with Accurev.I'm used to Clearcase/Accurev, but then read about GIT and really liked it, then I saw you can't do things in GIT like loading a specific version of a specific file to your "workspace" but only specific commits, and you can't inherit changes either (always a new commit is required). My question is: ok, GIT can't do it,but do you loose something?
jbhope
+18  A: 

To understand why Git does not offer some kind of what you are referring to as an "inheritance mechanism" (not involving a commit), you must first understand one of the core concepts of those SCMs (Git vs. ClearCase for instance)

  • ClearCase uses a linear version storage: each version of an element (file or directory) is linked in a direct linear relationship with the the previous version of the same element.

  • Git uses a DAG - Directed Acyclic Graph: each "version" of a file is actually part of a global set of changes in a tree that is itself part of a commit. The previous version of that must be found in a previous commit, accessible through a single directed acyclic graph path.

In a linear system, a config spec can specify several rules for achieving the "inheritance" you see (for a given file, first select a certain version, and if not present, then select another version, and if not present, then select a third, and so on).

The branch is a fork in a linear history a given version for a given select rule (all the other select rules before that one still apply, hence the "inheritance" effect)

In a DAG, a commit represents all the "inheritance" you will ever get; there is no "cumulative" selection of versions. There is only one path in this graph to select all the files you will see at this exact point (commit).
A branch is just a new path in this graph.

To apply, in Git, some other versions, you must either:

But since Git is a DAG-based SCM, it will always result in a new commit.

What you are "losing" with Git is some kind of "composition" (when you are selecting different versions with different successive select rules), but that would not be practical in a DVCS (as in "Distributed"): when you are making a branch with Git, you need to do so with a starting point and a content clearly defined and easily replicated to other repositories.

In a purely central VCS, you can define your workspace (in ClearCase, your "view", either snapshot or dynamic) with whatever rules you want.


unknown-google adds in the comment (and in his question above):

So, once we see the two models can achieve different things (linear vs DAG), my question is: which are the real life scenarios (especially for companies more than OSS) where linear can do things not possible for DAG? Are they worth it?

When it comes to "real-life scenario" in term of selection rules, what you can do in a linear model is to have several selection rules for the same set of files.

Consider this "config spec" (i.e. "configuration specification" for selection rules with ClearCase):

element /aPath/... aLabel3 -mkbranch myNewBranch
element /aPath/... aLabel2 -mkbranch myNewBranch

It selects all the files labelled 'aLabel2' (and branch from there), except for those labelled 'aLabel3' - and branch from there - (because that rule precedes the one mentioning 'aLabel2').

Is it worth it?

No.

Actually, the UCM flavor of ClearCase (the "Unified Configuration Management" methodology included with the ClearCase product, and representing all the "best practices" deduced from base ClearCase usage) does not allow it, for reasons of simplificity. A set of files is called a "component", and if you want to branch for a given label (known as a "baseline"), that would be translated like this following config spec:

element /aPath/... .../myNewBranch
element /aPath/... aLabel3 -mkbranch myNewBranch
element /aPath/... /main/0 -mkbranch myNewBranch

You have to pick one starting point (here, 'aLabel3') and go from there. If you want also the files from 'aLabel2', you will make a merge from all the 'aLabel2' files to the ones in 'myNewBranch'.

That is a "simplification" you do not have to make with a DAG, where each node of the graph represents a uniquely defined "starting point" for a branch, whatever is the set of files involved.

Merge and rebase are enough to combine that starting point with other versions of a given set of files, in order to achieve the desired "composition", while keeping that particular history in isolation in a branch.

The general goal is to reason in "coherent Version Control operations applied to a coherent component". A "coherent" set of files is one in a well-defined coherent state:

  • if labelled, all its files are labelled
  • if branched, all its files will branch from the same unique starting point

That is easily done in a DAG system; it can be more difficult in a linear system (especially with "Base ClearCase" where the "config spec" can be tricky), but it is enforced with the UCM methodology of that same linear-based tool.

Instead of achieving that "composition" through a "private selection rule trick" (with ClearCase, some select rule order), you achieve it only with VCS operations (rebase or merge), which leave a clear trace for everyone to follow (as opposed to a config spec private to a developer, or shared amongst some but not all developers). Again, it enforces a senses of coherency, as opposed to a "dynamic flexibility", that you may have a hard time to reproduce later on.

That allows you to leave the realm of VCS (Version Control System) and enter the realm of SCM (Software Configuration Management), which is mainly concerned with "reproducibility". And that (SCM features) can be achieved with a linear-based or a DAG-based VCS.

VonC
VonC, that's exactly the kind of explanation I should have added before actually making my question.Yes, I've read Scott Chacon's GIT internal's cover to cover, so I think I'm somehow familiar with the way GIT works. I also have "base Clearcase" experience.So, once we see the two models can achieve different things (linear vs DAG), my question is: which are the real life scenarios (especially for companies more than OSS) where linear can do things not possible for DAG? Are they work?
jbhope
Wanted to say "worth" instead of "work"! :-(
jbhope
Thanks VonC for your explanation. Simply put, it doesn't matter how much flexibility the "linear mode" gives, it really seems DAG is the way to go.I guess we'll be able to live without having the ability to "dynamically" build the trees with config_specs using GIT (as we did with CC), and also not being able to "give a try" building your branch together with a specific change at another branch but instead solve it using merging, reverts and so on.Anyone still going for good-ol CC?
jbhope
If you are using a "DAG-based" tool, then... DAG is indeed the way to go ;) In a linear-based VCS, a good methodology allows to "guide" the flexibility" towards a more coherent configuration, and I do not use the "good-ol CC" since 2003: only UCM CC for all major development (and some base-CC for quick consultation operations).
VonC
@Jonathan Leffler: hey Jonathan, still around then? (since http://stackoverflow.com/questions/645008/what-are-the-basic-clearcase-concepts-every-developer-should-know/645771#645771). As always, thank you for your edits :)
VonC
+1  A: 

ClearCase, without MultiSite, is a single repository but Git is distributed. ClearCase commits at the file level but Git commits at the repository level. (This last difference means the original question is based on a misunderstanding, as pointed out in the other posts here.)

If these are the differences we're talking about then I think 'linear' versus 'DAG' is a confusing way to distinguish these SCM systems. In ClearCase all the versions for a file are referred to as the file's version "tree" but really it is a directed acyclic graph! The real difference to Git is that ClearCase's DAGs exist per file. So I think it is misleading to refer to ClearCase as non-DAG and Git as DAG.

(BTW ClearCase versions its directories in a similar way to its files - but that's another story.)

Darren Yeats
A: 

As to the inheritance scheme used by accurev: GIT users will probably "get" the whole thing when the look at git-flow (see also: http://github.com/nvie/gitflow and http://jeffkreeftmeijer.com/2010/why-arent-you-using-git-flow/)

This GIT branching model more or less does (manually / with the help of the git-flow tool) what accurev does out-of-the-box automatically and with great GUI support.

So it appears GIT can do what accurev does. Since I never actually used git/git-flow day-to-day I can't really say how it works out but it does look promising. (Minus proper GUI support :-)

Martin
+2  A: 

I'll try to answer you question. (I have to say here that I have not used GIT only read about it, so if something that I mention below is wrong, please correct me)

"Can you enumerate scenarios where this inheritance is a must?"

I won't say it is a must, because you can solve a problem with the tool you have, and might be a valid solution for your environment. I guess it is more a matter of the processes than the tool itself. Making sure your process is coherent and also allows you to go back in time to reproduce any intermediate step/state is the goal, and the plus is that the tool let you run you your process and SCMP as painless as possible

The one scenario I can see it is handy to have this 'inheritance' behavior and use the power of the config spec, is when you want your set of changes "isolated" mapped to a task (devtask, CR, SR, or whatever defines the purpose/scope of your change set)

Using this composition allows you to have your development branch clean and still use different combination (using composition) of the rest of the code, and still have only what is relevant for the task isolated in a branch during the whole life cycle of the task, just until the integration phase.

Being purist having to commit/merge/rebase just to have a "defined starting point" , I guess it would 'pollute' your branch and you will end up with your changes + others changes in your branch/change set.

When/Where this isolation is useful? The points bellow might only make sense on the context of companies pursuing CMM and some ISO certifications, and might be of no interest for other kind of companies or OSS

  • Being really picky, you might want to accurately count the lines of code (added/modified/deleted) of the change set corresponding to a single developer, later used as one input for code and effort estimations.

  • It can be easier to review the code at different stages, having just your code in a single branch (not glued with other changes)

On big projects with several teams and +500 developers actively working concurrently on the same base code, (where graphical individual element version trees looks like a messy tangled web with several loadlines, one for each big customer, or one for each technology ) large config specs using composition of several degrees in depth made this amount of people work seamlessly to adapt the same product/system (base code) to different purposes. Using this config spec, dynamically gave each team or sub team, a different view to what they need and from where they need to branch of, (cascading on several cases) without the need of creating intermediate integration branches, or constantly merging and rebasing all the bits that you need to start with. Code from the same task/purpose was branching of different labels but made sense. (You can argue here the 'known baseline' as a principle of the SCM but simple labels contemplated in a written SCM Plan did the work) It must be possible to solve this with GIT (I guess in a non dynamic way) but I find really hard to picture without this 'inheritance' behavior. I guess the point mentioned by VonC "if branched, all its files will branch from the same unique starting point" was broken here, but beside it was well documented on the SCMP, I remember there were strong business reason to do it that way.

Yes building these config specs that I mentioned above was not free, in the beginning there where 4-5 well paid people behind the SCM but were later reduced by automated scripts that asked you what you want on terms of labels/branches/features and will write the CS for you.

The reproducibility here was achieved by just saving the Config Spec along with the task in the devTask system, so each task upstream mapped to requirements, and downstream mapped to a config spec, an a set of changes (code files, design documents, test documents etc)

So up to here one conclusion here might be, only if your project is big/complicated enough (and you can afford SC Managers along the life of the project:) ) then you only will start thinking if you need the 'inheritance' behavior or really versatile tool, otherwise you will go directly to a a tool that is free and already take care of the coherence of you SCM ... but there could be other factors on the SCM tool that might make you stick to one or to another ...read on..

Some side notes, that might be out of topic, but I guess in some cases like mine need to be considered.

I have to add here that we use the "good-ol CC" not UCM. Totally agree with VonC on the a good methodology allows to "guide" the flexibility towards a more coherent configuration. The good thing is that CC is pretty flexible and you can find (not without some effort) a good way to have thing coherent while in other SCM you might have it for free. But for example here (and other places that I've worked with CC) for C/C++ projects we cannot afford the price of not having the winkin feature (reusing the Derive objects), that reduce several X times compiling time. It can be argued that having a better design , a more decoupled code, and optimizing Makefiles can reduce the need to compile the whole thing, but there are cases that you need to compile the whole beast many times a day, and sharing the DO saves heaps of time/money. Where I'm now we try to use as much free tool as we can, and I think we will get rid of CC if we can find a cheaper or free tool that implements the winkin feature.

I'll finish with something that Paul mention , different tools are better that other for different purposes but I will add that you can get away from some limitation of the tool by having a coherent process and without scarifying reproducibility, key points off the SCM In the end I guess the answer to it is worth? depends on your "problem", the SDLC you are running, your SCM processes, and if there is any extra feature (like winkin) that might be useful in your environment.

my 2 cents

FedeN
+1. But for working also on large projects with ~150 developers, I found the simplifications made with UCM-based configuration much easier to deal with than complex config specs.
VonC
Where are onyl ~25 dev. still using plain CC, but I have scripted the creation of the views and Config spec, rebasing and integrations so it is all transparent for the users.Here the needs for projects are ~1 weekly release, so they can not wait to an integration phase and need to 'rebase' their code in daily bases, before hitting the INT phase. To get away with this requirement and still conserve the changes of each task in a branch and avoid copy-merge, I'm using one level of 'cascading' on the config spec plus some tricks with attributes and label.
FedeN
I will publish somewhere when I have this properly documented. It mixes the old well known DEV-INT-RELEASE Schema with some more agile concepts.
FedeN
+1 for insightful comment. I think everything really depends on the type of isolation you are choosing. For git/mercurial it (isolation) can easily be branch per task or branch per feature or branch per technology, etc. (also you have branch per developer in the fork). Some other systems are quite wasteful in terms of space and speed when working with multiple branches. (It might be a little out of context, but I had to fill the textbox).
Bogdan Maxim