What are the core concepts of version control system every developer should know?
Since I am working with clearcase, I made the question specific for clearcase.
What are the core concepts of version control system every developer should know?
Since I am working with clearcase, I made the question specific for clearcase.
Eric's Source Control HOWTO is a great guide that is Tool-Independent.
ClearCase is a beast to use. Slow, buggy, and expensive. Some things that I have done to cope with using CC are:
In my opinion, branching and merging are the most important concepts in any source control system (next to versioning itself, of course).
Once you understand how that's done (and Clearcase does it very well, to the point where we do even small changes as a branch and re-merge, not something I would have ever done with RCS or CVS), you'll find your life is made a lot easier.
Somehow off-topic, but - I don't know almost no developer who's happy with ClearCase. I was told it should have sophisticated features, but as a svn and git user I cannot possibly think of something I miss in git or subversion. So that's something one should know about ClearCase - most developers are really happy to work with something simple as subversion or git (yes, even git is easier to grasp), And even after I knew how to complete the simplest tasks in ClearCase, I had the constant feeling ClearCase works against me, not with me.
I worked with clearcase for the better part of 6 years and generally found it tolerable. It does have a certain learning curve but once you get used to the quirks you can pretty much work smoothly with it. A very competent CC admin that knows what he's doing is essential for anything but trivial setups. Unless you have one, people are going to run into problems and soon enough there will be talk about the "ClearCase" problem. Then management will have to intervene by switching to something else causing only waste of time for everyone involved. CC is not a bad product, It's just sometimes poorly understood.
Here are few concepts I found important, some of these are not entierly CC only oriented -
I've worked on a number of medium to large projects successfully using both Clearcase and SVN. Both are great tools but the team using them need documented processes. Create a process that describes how you will use the version control system.
1) find or create a best practices document for your Version Control System. Here's one for subversion, adapt it to your Clearcase process. All developers must adhere to the same game plan.
Basically decide if you are going to 'always branch' or 'never branch'.
Never Branch Scheme:
Always Branch Scheme:
'Branch when needed' is a compromise and may work best for many projects.
2) With Clearcase (and Subversion) you must learn to merge -- merging is your friend. Learn to use the merging capabilities of Clearcase or use a tool like Beyond Compare or emacs-diff. If your project is well modularized (many small decoupled files), you will benefit with fewer (or no) conflicts during merging.
3) Enjoy.
We've been using CC for just over fifteen years now. It has a lot of good features.
All our development is done on branches; I created a couple today, for a couple of different sets of changes. When I'd checked into the branch, I got a colleague to review the changes, and then merged back into /main/LATEST - which happens to be where my work needed to go. If it had been for an older release on a branch, it wouldn't have been any harder.
The merges from my temporary branches were fully automatic; no-one had changed the files I worked on while I had them checked out. Although by default checkouts are reserved (locked), you can always unreserve the checkout later, or create the checkout unreserved. When the changes take multiple days, the resynchronization of my temporary branch with the main branch is easy and usually automatic. The mergetool is OK; the biggest problem for me is that my server machine is 1800 miles or so from my office (or home), so that X over that distant is a bit slow (but not intolerably so). I've not used a better mergetool, but that may not be saying much since I've not used any other graphical mergetool.
Views (dynamic views) are fast on our setup. I've not used snapshot views, but I don't work on Windows when I can help it (our team uses snapshot views on Windows; I'm not clear why). We have complex branching systems, but the main development is done on /main/LATEST, and the release work is done on a branch. After GA, maintenance work is done on a release specific branch, and merged forward to /main/LATEST (via any intermediate versions).
CC does need good administrators - we have them and are fortunate in doing so.
CC is not trivial to use, though at the moment, I find 'git' as daunting as CC is to those who've not used it. But the basics are much the same - checkout, change, checkin, merge, branch, and so on. Directories can be branched - cautiously - and certainly are version controlled. That is invaluable.
I don't see the office switching from CC any time.
I wrote:
The biggest problem I have with CC is that it does not embed version numbers into the source files - a problem that git has too, AFAICT. I can half see why; I'm not sure I like giving up that trackability, though. So, I still use RCS (not even CVS) for most of my personal work. One day, I may switch to git - but it will be a jolt and it will take a lot of work to retool the release systems configured around (SCCS and) RCS.
In response, @VonC notes:
We always considered that practice as evil (mixing meta-data information into data), introducing "merge hell". See also How to get Clearcase file version inside a Java file. Of course, you can use a trigger for RCS keyword substitution (Clearcase Manual: Checkin Trigger Example) provided you use an appropriate merge manager.
There are several issues exposed by this discussion, and they all get mixed together. My views verge on the archaic, but have a rationale behind them, and I'm going to take the time to write them down (messed up by life - it may take several edits to complete this).
I learned SCCS back in 1984, about the time RCS was released (1983, I believe), but SCCS was on my machine and the internet was nascent at best. I moved from SCCS to RCS reluctantly in the mid-90s because the SCCS date format uses double-digits for years and it was not clear whether SCCS would be universally fixed in time (it was). In some respects, I don't like RCS as much as SCCS, but it has some good points. Commercially, my employer used SCCS up to mid-1995, but they started to switchover to Atria ClearCase from early 1994, tackling separate product sets one at a time.
Our project migrated later, when there was already some experience with CC. Partly because I insisted on it, we embedded version control information in the source files via a check-in trigger. This lasted a while - but only a while - because, as VonC states, it leads to merge hell. The trouble is that if a version with the tag /main/branch1/N is merged with /main/M from the common base version /main/B, the extracted versions of the files contain a single line which has edits in each version - a conflict. And that conflict has to be resolved manually, rather than being handled automatically.
Now, SCCS has ID keywords. ID keywords take two formats, one for files being edited and one for files that are not being edited:
Edit Non-Edit
%I% 9.13
%E% 06/03/09
%Z% @(#)
%M% s.stderr.c
If you attempted a 3-way merge of the editable versions of SCCS files (with the %x% notation), then there would be no conflicts on the lines containing metadata unless you changed the metadata on those lines (e.g. by changing from US-style %D% dates to UK-style %E% dates - SCCS does not support ISO-style 2009-03-15 dates as standard.)
RCS also has a keywords mechanism, and the keywords also take two formats, though one is for files which have not yet been inserted into RCS and the other is for those that have:
Original After insertion
$Revision$ $Revision: 9.13 $
$Date$ $Date: 2009/03/06 06:52:26 $
$RCSfile$ $RCSfile: stderr.c,v $
The difference is between a '$' following the keyword and a ':', space, text, space and finally a '$'. I've not done enough merging with RCS to be sure what it does with keyword information, but I note that if it treated both the expanded and 'contracted' notations as equivalent (regardless of the content of the expanded material), then merging could take place without conflict, leaving the contracted notation in the output of the merge, which would be appropriately expanded when the resulting file is retrieved after checkin.
As I've indicated in my discussion of SCCS and RCS, if 3-way merging is done treating the keywords in the correct (contracted or editable) formats, then there is no merge conflict.
The problem with CC (from this viewpoint - clearly, the implementors of CC disagree) is that it lacks a system for handling keywords, and therefore also lacks an appropriate merge manager.
If there was a system for handling keywords and an appropriate merge manager, then:
The downside of this is that it requires either a special difference tool that recognizes metadata markers and treats them specially, or it requires that the files fed to the difference tool is canonicalized (the metadata markers are reduced to the neutral form - $Keyword$ or %K% in RCS and SCCS terms). I'm sure that this little bit of extra work is the reason why it is not supported, something I've always felt was shortsighted in such a powerful system. I've no particular attachment to RCS or SCCS notations - the SCCS notations are easier to handle in some respects, but they're essentially equivalent - and any equivalent notation could be used.
I like to have the metadata in the source code because my source code (as opposed to my employer's source code) is distributed outside the aegis of the source code control system. That is, it is mostly open source - I make it available to all and sundry. If someone reports a problem in a file, especially in a file they've modified, I think it is helpful to know where they started from, and that's represented by the original metadata in the source file.
Here, SCCS has an advantage over RCS: the expanded forms of the SCCS keywords are indistinguishable from regular text, whereas the RCS keywords continue to look like keywords, so if the other person has imported the material into their own RCS repository, their metadata replaces my metadata, a problem that does not happen with SCCS in the same way (the other person has to do work to overwrite the metadata).
Consequently, even if someone takes a chunk of my source code and modifies it, there are usually labels enough in it to identify where it came from, rather than leaving me to speculate about which version it is based on. And that, in turn, makes it easier to see what parts of the problem are of my making, and what parts are of their making.
Now, in practice, the way open source works, people don't migrate code around as much as you might think. They tend to stick with the released version fairly closely, simply because deviating is too expensive when the next official release is made.
I'm not sure how you are supposed to determine the base version of a piece of source code that originated from your work and has been revised since then. Finding the correct version, though, seems key to doing that, and if there are fingerprints in the code, then it can be easier.
So, that's a moderate summary of why I like to embed the version information in the source files. It is in large part historical - SCCS and RCS both did it, and I liked the fact that they did. It may be ancient relic, something to be bidden farewell in the era of DVCS. But I'm not yet wholly convinced by that. However, it might take still more of an essay to explain the ins and outs of my release management mechanism to see why I do things as I do.
One aspect of the reasoning is that key files, such as 'stderr.c' and 'stderr.h', are used by essentially all my programs. When I release a program that uses it, I simply ensure I have the most recent version - unless there's been an interface change that requires a back-version. I haven't had that problem for a while now (I did a systematic renaming in 2003; that caused some transitional headaches, but Perl scripts allowed me to implement the renaming pretty easily). I don't know how many programs use that code - somewhere between 100 and 200 would be a fair guess. This year's set of changes (the version 9.x series) are still somewhat speculative; I haven't finally decided whether to keep them. They are also internal to the implementation and do not affect the external interface, so I don't have to make up my mind just yet. I'm not sure how to handle that using git. I don't want to build the library code into a library that must be installed before you can build my software - that's too onerous for my clients. So, each program will continue to be distributed with a copy of the library code (a different sort of onerous), but only the library code that the program needs, not the whole library. And I pick and choose for each program which library functions are used. So, I would not be exporting a whole sub-tree; indeed, the commit that covered the last changes in the library code is typically completely unrelated to the commit that covered the last changes in the program. I'm not even sure whether git should use one repository for the library and another for the programs that use it, or a common larger repository. And I won't be migrating to git until I do understand this.
OK - enough wittering. What I have works for me; it isn't necessarily for everyone. It does not make extraordinary demands on the VCS - but it does require version metadata embedded in the files, and CC and Git and (I think) SVN have issues with that. It probably means I'm the one with problems - hangups for the lost past. But I value what the past has to offer. (I can get away with it because most of my code is not branched. I'm not sure how much difference branching would make.)