tags:

views:

617

answers:

3

What algorithm does CVS use when merging two branches (using the -j)?

  • Is the CVS tag, branch, or date aware?
  • Does it just do a plain text diff (for example, using the unix diff tool)?
  • Does it use a 2 way or 3 way diff?
  • If it uses a 3 way diff, what is the base version it uses?

Thanks

+5  A: 

I remember a reading a while ago that CVS merge actually uses the diff3 algorithm to perform merging.

This PDF article A Formal Investigation of Diff3, by Sanjeev Khanna, Keshav Kunal, and Benjamin C. Pierce from the Universtiy of Pennsylvania describes the diff3 algorithm in detail.

If focuses primarily on the properties of the merging algorithm itself, not on how it integrates with CVS.

In answer to your questions:

Tag, date awareness

From the CVS man page:

-j tag[:date] Merge in changes from revisions specified by tag or, when date is specified and tag is a branch tag, the version from the branch tag as it existed on date

2 or 3 way and text/binary awareness

diff3 defaults to a plain text diff. It compares (diffs) 3 versions of the file.

From the diff3 man page:

If `diff3' thinks that any of the files it is comparing is binary (a non-text file), it normally reports an error, because such comparisons are usually not useful. As with 'diff', you can force 'diff3' to consider all files to be text files and compare them line by line by using the '-a' or '--text' options.

Base version during comparison

The base version, according to the linked article, is the last common version (O) between the two current versions of the file (A and B). It first uses the 2 way diff algorithm to find the longest common subsequences between O and A, and O and B.

Then (quoted from the article) it:

... takes the regions where O differs from either A or B and coalesces the ones that overlap, leading to the alternating sequence of stable (all replicas equal) and unstable (one or both replicas changed) chunks shown in Figure 1(c).3 Finally, it examines what has changed in each chunk and decides what changes can be propagated, as shown in Figure 1(d)—here, the second chunk is changed only in A (by inserting 4, 5), so this change can be propagated to B, but the fourth chunk has changes in both A and B, so nothing can be propagated.

Ash
The link and summary you provided are very informative. However, they speak more of the diff3 merging algorithm. My question is whether cvs will take any branching info into account to determine how to merge. The answer to this might be that it doesn't. It simply does a text based comparison of the two files, and that's it.Can you confirm this?
oneself
@oneself, You can specify -j once or twice with tags that represent a specific version, but you are just telling CVS which revisions of the file to supply to diff3. Apart from this I am not aware of any other way branching influences the diff3 algorithm in cvs. So my answer to your question is that diff3 in cvs does a text based comparison on files and that it may or may not be branching aware depending on the tags you supply to it. Very happy to be corrected by a cvs expert if wrong (I'm just a user).
Ash
That makes two of us :)
oneself
+3  A: 

There are two different forms of the "merge" command you can use, that do subtly different things:

  1. cvs up -j TAG
  2. cvs up -j TAG1 -j TAG2

The difference between the variants is how the "base" revision is selected, but the basic algorithm of either variant is that for each file, a diff between two revisions selected by CVS is applied on top of your current working copy.

In the first form, the merge base is the common ancestor of the given TAG and the working copy revision. So let's say your local revision is 1.38 (rev #38 on HEAD) and you're merging 1.34.4.2 (rev. 2 on branch 4 of rev #34 on HEAD) - the common ancestor will be 1.34. I believe this variant does a 3-way merge using the two diffs 1.34..1.38 and 1.34..1.34.4.2, producing conflicts where they mismatch.

In the second form, you're specifying the base revision yourself, so the end result is about the same as cvs diff -r TAG1 -r TAG2 | patch except for getting conflict markers from CVS.

olsner
A: 

The CVS uses three revisions of file merging the differences between the two into the third. The actual merge doesn't use any information besides the three files checked out. You can see the details in CVS source code. The function in question is RCS_merge in src/rcscmds.c, which uses call_diff3 from diff/diff3.c (or something like that). You can, of course, satisfy your curiosity by looking up CVS sources yourself.

Michael Krelin - hacker