tags:

views:

5744

answers:

9

What are the core concepts of version control system every developer should know?

Since I am working with clearcase, I made the question specific for clearcase.

+3  A: 

Eric's Source Control HOWTO is a great guide that is Tool-Independent.

Jason Punyon
+11  A: 

ClearCase is a beast to use. Slow, buggy, and expensive. Some things that I have done to cope with using CC are:

  1. Always put good comments when you check in.
  2. Use a common config spec and don't change it very often.
  3. Never try to use CC over a VPN or slow network connection.
  4. Turn off the loading off CC doctor on startup.
  5. Don't move files around to different directories.
  6. Schedule at least 2 min per file for checkin.
  7. Snapshot views are slow, but dynamic views are slower.
  8. Make a development habit of checking in early and often because the reserved files and merges are painful.
  9. Have all the developers check out files in unreserved by default.
Joshua
I disagree on the beast comment. I think what you're missing is a CC administrator that knows what they're doing. Yes, CC is a complicated system and we've had troubles with it, but not since we hired someone that knows it well. It's not something I'd use for casual source control.
paxdiablo
+1 for item number 1, even though it is not CC specific ;-)
Treb
If your checkins take 2 minutes per file, well, you've got serious problems with your setup. That is grotesquely out of whack!
Jonathan Leffler
BTW, we have a full time CC admin. 10 years ago, this would have been acceptable. Based on the tools today, not so much.
Joshua
The reason it takes so long for checkins is the VERY inefficient protocol, it is connected to the networks LDAP, and it needs to talk to the ClearQuest server to verify the issue report, and then also apply the CQ labels.
Joshua
So, your problem is in part that you are using ClearQuest as well as, rather than only, ClearCase. We are not using ClearQuest (yet), so I can't comment on its speed, but it isn't entirely fair to ClearCase to blame it for ClearQuest's defects. OTOH, ClearQuest is sold on top of ClearCase, so ...
Jonathan Leffler
CQ is a separate product, not part of CC - it can be run independently. I don't doubt that some of the IBM sales force sell it as an integrated solution but there is no dependency either way.
paxdiablo
+2  A: 

In my opinion, branching and merging are the most important concepts in any source control system (next to versioning itself, of course).

Once you understand how that's done (and Clearcase does it very well, to the point where we do even small changes as a branch and re-merge, not something I would have ever done with RCS or CVS), you'll find your life is made a lot easier.

paxdiablo
A: 

Somehow off-topic, but - I don't know almost no developer who's happy with ClearCase. I was told it should have sophisticated features, but as a svn and git user I cannot possibly think of something I miss in git or subversion. So that's something one should know about ClearCase - most developers are really happy to work with something simple as subversion or git (yes, even git is easier to grasp), And even after I knew how to complete the simplest tasks in ClearCase, I had the constant feeling ClearCase works against me, not with me.

siddhadev
CC has a couple of great features (dynamic views are pretty awesome) but its model is based around single file locking, with changesets kind of bolted on in half baked ways - it's a "state-of-the-art-in-1990" SCM tool. It still works, obviously, but SCM technology has moved on.
Matt Curtis
+2  A: 

I worked with clearcase for the better part of 6 years and generally found it tolerable. It does have a certain learning curve but once you get used to the quirks you can pretty much work smoothly with it. A very competent CC admin that knows what he's doing is essential for anything but trivial setups. Unless you have one, people are going to run into problems and soon enough there will be talk about the "ClearCase" problem. Then management will have to intervene by switching to something else causing only waste of time for everyone involved. CC is not a bad product, It's just sometimes poorly understood.

Here are few concepts I found important, some of these are not entierly CC only oriented -

  • A check-out is unlike the regular CVS-like notion of a check-out. When you check out you lock the file until you check-in it in.
  • There is no problem with moving files. infact this works flawlessly.
  • Version trees are essential to understanding what has been happening to a file. They can get quite messy for active files but when you get used to watching them it becomes a very useful tool and One that is very lacking in other source control tools such as SVN (to some extent).
  • Do not use dynamic views under any circumstances. its not worth it.
  • before making a new branch, stream or project, advise with your admin to make sure that what you create is really what will serve you best. When starting a new code-base, make sure you get the streams and projects layout right from the start by planning ahead. changing it later is a real head-ache if even possible.
  • Fine tune the privileges of users and set up triggers for common events to prevent common mistakes or enforce policies. The server is very configurable and most for problems you encounter there is probably a reasonable solution.
  • educate the developers on anything from basic concepts to advance operations. A power-user that can find what the problem is using cleartool lowers the load on the admin.
  • Don't leave dangling streams and views. When a developer leaves the project have someone to remove all the views he had on his machine and delete all his private streams. Not keeping your server clean will result in... it being dirty and over time, slow. When you do a "find all checkouts" on all streams and views you should not see files that are checked-out by people who no longer exist.
  • Mandate an "always rebase before deliver" policy for child branches to avoid people "breaking the integration stream" when delivering code that conflicts with recent changes.
  • Continuous integration - don't let the integration stream stagnate while each developer or team work on their own branch. Mandate once every X time everyone has to atleast rebase to the most recent integration baseline if not to deliver their stable changes. This is indeed very difficult to do, especially with large projects but the other alternative is "integration hell" where at the end of the month no one does anything for 3 days while some poor sod tries to make all the changes fit together
shoosh
- I seriously argue with the comment about Dynamic views. How else would you get derived objects?
Spedge
You will not. anything you need to derive should be able to be built locally.
shoosh
+1  A: 

I've worked on a number of medium to large projects successfully using both Clearcase and SVN. Both are great tools but the team using them need documented processes. Create a process that describes how you will use the version control system.

1) find or create a best practices document for your Version Control System. Here's one for subversion, adapt it to your Clearcase process. All developers must adhere to the same game plan.

Basically decide if you are going to 'always branch' or 'never branch'.

Never Branch Scheme:

  • The never branch scheme is what SourceSafe uses where files are locked during checkout and become available during checkin. This scheme and is okay for small (1 or 2 developers) team projects.

Always Branch Scheme:

  • The always branch scheme means developers create branches for each bugfix or feature add. This scheme is needed for larger projects, projects that have a lead (buildmeister) who manages what changes get allowed into /main/LATEST in Clearcase or /trunk in SVN.
  • The always branch scheme means you can checkin often w/o fear of breaking the build. Your only opportunity to break the build is only after your bugfix or feature is complete and you merge it to /main/LATEST.

'Branch when needed' is a compromise and may work best for many projects.

2) With Clearcase (and Subversion) you must learn to merge -- merging is your friend. Learn to use the merging capabilities of Clearcase or use a tool like Beyond Compare or emacs-diff. If your project is well modularized (many small decoupled files), you will benefit with fewer (or no) conflicts during merging.

3) Enjoy.

thompsongunner
+11  A: 

We've been using CC for just over fifteen years now. It has a lot of good features.

All our development is done on branches; I created a couple today, for a couple of different sets of changes. When I'd checked into the branch, I got a colleague to review the changes, and then merged back into /main/LATEST - which happens to be where my work needed to go. If it had been for an older release on a branch, it wouldn't have been any harder.

The merges from my temporary branches were fully automatic; no-one had changed the files I worked on while I had them checked out. Although by default checkouts are reserved (locked), you can always unreserve the checkout later, or create the checkout unreserved. When the changes take multiple days, the resynchronization of my temporary branch with the main branch is easy and usually automatic. The mergetool is OK; the biggest problem for me is that my server machine is 1800 miles or so from my office (or home), so that X over that distant is a bit slow (but not intolerably so). I've not used a better mergetool, but that may not be saying much since I've not used any other graphical mergetool.

Views (dynamic views) are fast on our setup. I've not used snapshot views, but I don't work on Windows when I can help it (our team uses snapshot views on Windows; I'm not clear why). We have complex branching systems, but the main development is done on /main/LATEST, and the release work is done on a branch. After GA, maintenance work is done on a release specific branch, and merged forward to /main/LATEST (via any intermediate versions).

CC does need good administrators - we have them and are fortunate in doing so.

CC is not trivial to use, though at the moment, I find 'git' as daunting as CC is to those who've not used it. But the basics are much the same - checkout, change, checkin, merge, branch, and so on. Directories can be branched - cautiously - and certainly are version controlled. That is invaluable.

I don't see the office switching from CC any time.


Embedded Version Numbers - Good or Evil?

I wrote:

The biggest problem I have with CC is that it does not embed version numbers into the source files - a problem that git has too, AFAICT. I can half see why; I'm not sure I like giving up that trackability, though. So, I still use RCS (not even CVS) for most of my personal work. One day, I may switch to git - but it will be a jolt and it will take a lot of work to retool the release systems configured around (SCCS and) RCS.

In response, @VonC notes:

We always considered that practice as evil (mixing meta-data information into data), introducing "merge hell". See also How to get Clearcase file version inside a Java file. Of course, you can use a trigger for RCS keyword substitution (Clearcase Manual: Checkin Trigger Example) provided you use an appropriate merge manager.

There are several issues exposed by this discussion, and they all get mixed together. My views verge on the archaic, but have a rationale behind them, and I'm going to take the time to write them down (messed up by life - it may take several edits to complete this).

Background

I learned SCCS back in 1984, about the time RCS was released (1983, I believe), but SCCS was on my machine and the internet was nascent at best. I moved from SCCS to RCS reluctantly in the mid-90s because the SCCS date format uses double-digits for years and it was not clear whether SCCS would be universally fixed in time (it was). In some respects, I don't like RCS as much as SCCS, but it has some good points. Commercially, my employer used SCCS up to mid-1995, but they started to switchover to Atria ClearCase from early 1994, tackling separate product sets one at a time.

Early ClearCase Experiment with Triggers - and Merge Hell

Our project migrated later, when there was already some experience with CC. Partly because I insisted on it, we embedded version control information in the source files via a check-in trigger. This lasted a while - but only a while - because, as VonC states, it leads to merge hell. The trouble is that if a version with the tag /main/branch1/N is merged with /main/M from the common base version /main/B, the extracted versions of the files contain a single line which has edits in each version - a conflict. And that conflict has to be resolved manually, rather than being handled automatically.

Now, SCCS has ID keywords. ID keywords take two formats, one for files being edited and one for files that are not being edited:

Edit         Non-Edit
%I%          9.13
%E%          06/03/09
%Z%          @(#)
%M%          s.stderr.c

If you attempted a 3-way merge of the editable versions of SCCS files (with the %x% notation), then there would be no conflicts on the lines containing metadata unless you changed the metadata on those lines (e.g. by changing from US-style %D% dates to UK-style %E% dates - SCCS does not support ISO-style 2009-03-15 dates as standard.)

RCS also has a keywords mechanism, and the keywords also take two formats, though one is for files which have not yet been inserted into RCS and the other is for those that have:

Original       After insertion
$Revision$     $Revision: 9.13 $
$Date$         $Date: 2009/03/06 06:52:26 $
$RCSfile$      $RCSfile: stderr.c,v $

The difference is between a '$' following the keyword and a ':', space, text, space and finally a '$'. I've not done enough merging with RCS to be sure what it does with keyword information, but I note that if it treated both the expanded and 'contracted' notations as equivalent (regardless of the content of the expanded material), then merging could take place without conflict, leaving the contracted notation in the output of the merge, which would be appropriately expanded when the resulting file is retrieved after checkin.

The ClearCase problem is the absence of an appropriate merge manager

As I've indicated in my discussion of SCCS and RCS, if 3-way merging is done treating the keywords in the correct (contracted or editable) formats, then there is no merge conflict.

The problem with CC (from this viewpoint - clearly, the implementors of CC disagree) is that it lacks a system for handling keywords, and therefore also lacks an appropriate merge manager.

If there was a system for handling keywords and an appropriate merge manager, then:

  • The system would automatically embed the metadata into files at appropriate markers.
  • On merge, the system would recognize that lines with the metadata markers do not conflict unless the markers changed differently - it would ignore the metadata content.

The downside of this is that it requires either a special difference tool that recognizes metadata markers and treats them specially, or it requires that the files fed to the difference tool is canonicalized (the metadata markers are reduced to the neutral form - $Keyword$ or %K% in RCS and SCCS terms). I'm sure that this little bit of extra work is the reason why it is not supported, something I've always felt was shortsighted in such a powerful system. I've no particular attachment to RCS or SCCS notations - the SCCS notations are easier to handle in some respects, but they're essentially equivalent - and any equivalent notation could be used.

Why I still think metadata in the file is good

I like to have the metadata in the source code because my source code (as opposed to my employer's source code) is distributed outside the aegis of the source code control system. That is, it is mostly open source - I make it available to all and sundry. If someone reports a problem in a file, especially in a file they've modified, I think it is helpful to know where they started from, and that's represented by the original metadata in the source file.

Here, SCCS has an advantage over RCS: the expanded forms of the SCCS keywords are indistinguishable from regular text, whereas the RCS keywords continue to look like keywords, so if the other person has imported the material into their own RCS repository, their metadata replaces my metadata, a problem that does not happen with SCCS in the same way (the other person has to do work to overwrite the metadata).

Consequently, even if someone takes a chunk of my source code and modifies it, there are usually labels enough in it to identify where it came from, rather than leaving me to speculate about which version it is based on. And that, in turn, makes it easier to see what parts of the problem are of my making, and what parts are of their making.

Now, in practice, the way open source works, people don't migrate code around as much as you might think. They tend to stick with the released version fairly closely, simply because deviating is too expensive when the next official release is made.

I'm not sure how you are supposed to determine the base version of a piece of source code that originated from your work and has been revised since then. Finding the correct version, though, seems key to doing that, and if there are fingerprints in the code, then it can be easier.

So, that's a moderate summary of why I like to embed the version information in the source files. It is in large part historical - SCCS and RCS both did it, and I liked the fact that they did. It may be ancient relic, something to be bidden farewell in the era of DVCS. But I'm not yet wholly convinced by that. However, it might take still more of an essay to explain the ins and outs of my release management mechanism to see why I do things as I do.

One aspect of the reasoning is that key files, such as 'stderr.c' and 'stderr.h', are used by essentially all my programs. When I release a program that uses it, I simply ensure I have the most recent version - unless there's been an interface change that requires a back-version. I haven't had that problem for a while now (I did a systematic renaming in 2003; that caused some transitional headaches, but Perl scripts allowed me to implement the renaming pretty easily). I don't know how many programs use that code - somewhere between 100 and 200 would be a fair guess. This year's set of changes (the version 9.x series) are still somewhat speculative; I haven't finally decided whether to keep them. They are also internal to the implementation and do not affect the external interface, so I don't have to make up my mind just yet. I'm not sure how to handle that using git. I don't want to build the library code into a library that must be installed before you can build my software - that's too onerous for my clients. So, each program will continue to be distributed with a copy of the library code (a different sort of onerous), but only the library code that the program needs, not the whole library. And I pick and choose for each program which library functions are used. So, I would not be exporting a whole sub-tree; indeed, the commit that covered the last changes in the library code is typically completely unrelated to the commit that covered the last changes in the program. I'm not even sure whether git should use one repository for the library and another for the programs that use it, or a common larger repository. And I won't be migrating to git until I do understand this.

OK - enough wittering. What I have works for me; it isn't necessarily for everyone. It does not make extraordinary demands on the VCS - but it does require version metadata embedded in the files, and CC and Git and (I think) SVN have issues with that. It probably means I'm the one with problems - hangups for the lost past. But I value what the past has to offer. (I can get away with it because most of my code is not branched. I'm not sure how much difference branching would make.)

Jonathan Leffler
"it does not embed version numbers into the source files" ??? We always considered that practice as evil (mixing meta-data information into data), introducing "merge hell". See also http://www.cmcrossroads.com/component/option,com_fireboard/func,view/id,23023/catid,31/Itemid,593/#23028
VonC
(Big-big thank you for all the edit in my post, by the way. The depth of your knowledge and experience shown in this edits/corrections is impressive)
VonC
Of course, you can use a trigger for RCS keyword substitution (http://www.ibm.com/developerworks/rational/library/4311.html#N1046A) provided you use an appropriate merge manager (http://www.ibm.com/developerworks/rational/library/05/1213_diebolt/index.html)
VonC
Whoa... I gonna read all this slowly, but in the meantime... +1!
VonC
Basically, I do not believe in metadata being fetched outside the the VCS, except for a big README (an its associated checksum), with a unique id in it labelling the all set of file. That way, the files can be modified over and over, I know where they come from.
VonC
For real distributed code... well, that is what DVCS are for, and something like Git will do a perfect job of managing distributed modifications while still being able to retrieve its original state and incorporating any subsequent changes back in his repository.
VonC
(Off course, all those comments are based on a much shorter career than yours (12 years, including 7 in the VCS business), and I do not pretend being aware of *all* scenarios or historical contexts which would actually justify using embedded metadata.
VonC
Embedded version numbers aren't evil unless the tools can't handle it. Subversion does it well for text files. But I wish it provided a facility for "hooks" for non-text file types (e.g. Word docs) to add revision info on check-out/update and canonicalise on commit/diff/merge.
Craig McQueen
+51  A: 
VonC
See also http://stackoverflow.com/questions/449549/how-to-leverage-clearcases-features/449810#449810
VonC
Excellent explanation VonC. Upped +1 (unfortunately cannot up it more). This is what I was looking for. Can we communicate over email? Please let me know.
@vicky21. Sure. write to vonc at laposte dot net
VonC
@Jonathan: thank you for all the edits/corrections. I should not try to write long article this late in the night ;)
VonC
NP: It was a good answer before I tinkered with it; you've improved it still more with your latest edits. Well done.
Jonathan Leffler