views:

81

answers:

4

I've recently joined a team with a code base which hasn't been under version control. The code has been forked several times by different people in the organisation working on different projects. Now we're going to start using version control and we want to merge the valuable contributions from the different projects. The projects share a common original version, so I've made one branch for each project and plan to start merging. Now I'm wondering what strategy to use when merging.

How would you choose the branch to start merging? The one with the least scary changes? The one with the most scary? Would you synchronize all branches with trunk after merging one of the projects into trunk? Is there a best practice to follow when working with many branches that have diverged a lot?

Or should I just stop worrying and start merging them one by one?

A: 

I guess that there are lots of copies/zip files for each subproject, but that the different "branches" live in different folders/computers or are distinguishable by some other means.

Then you have two options: either reconstruct the whole project history, or use the current releases as distinct merge bases.

History reconstruction

Go this way only when you need the full history later, since this is lots of work.

The first problem is to identify which versions got copied from each other. You can use the file time-stamps to get a rough history estimation(look for the newest file in each copy). After you have a time line, you can import them into a VCS and look if there are reversing patches coming up (something got included in rev4, excluded in rev5 and included again in rev6), which is an indicator that the order isn't correct. Also look if the changes makes mostly sense (changes create more features and lesser bugs). Be prepared to do this step more than one time until you have the correct order. So don't use the final VCS for this, since you might want to throw the intermediate steps away. I also recommend to have the repository on the local machine, since you need to do lot's of diffs between multiple versions, and don't want any network latencies (I use mercurial and tortoiseHg for such tasks).

At the end of this process you should have all copies in chronological order, and know (at least roughly) where the different branches are based on.

So when you have something like this:

Base --> A --> A'
         \
          \---> B --> B'
                \
                 \--> C

You can start by creating the trunk with Base, add changes A and A' there. Then you create branch B with A as parent, and add B'. Then create branch C with B as parent. And so on for every copy you have.

After you have the reconstructed history, you can start the big merge. But unless you could reconstruct internal merges during the reconstruction, you will have no advantage using this way when you pull everything together.

Only releases

Import the base version into the VCS. Then create a branch for every other release, and put each other release into the corresponding branch. Afterwards you can merge everything.

Rudi
Sorry if the question was unclear. Creating the branches isn't the problem. I've edited the question in an attempt to clarify.
A: 

Before you start merging I should consider what version control tool to use (you didn't mention what you are using). Definitely avoid VSS, CVS and Perforce. Subversion and Perforce are OK but if you create many branches then you will find there is an administrative overhead to keep it all working. GIT, Accurev and PureCM are the best tools I have used for merging. Go with GIT if you like the distributed model, otherwise I would go with PureCM which is very cheap.

You should create the branch for your trunk based on the common code. From this you can create the other branches one by one from the trunk. Create a workspace for the project branch, clobber the workspace files with the project files and check them in. You can then merge this change back to the trunk and resolve any conflicts.

A: 

If you want to have smooth merge, you should make sure you include base versions for each merge into the version control system, if you have those. Just determine that one of the branches that people most branched from is a trunk and then you need to record a version on the trunk for every time someone branched from it, if you have those. Without those base versions the merges will become a mess.

If there was no version control, not even someone doing a tarball of the code at the time they merged, so you cannot reconstruct even as little as the base versions, you will need to be very careful. Put the code into the source control prior to merging anything. Try to reconstruct the branches in as approximate way as possible by what has been branched from where.

Now if your source control system records merge links between branches and keeps a good track of base versions and merges, like for example ClearCase, you want to start from smaller merges, which can be done by individual developers to reduce the work in parallel first. Then do the large merges with all developers involved.

If on the other hand you don't have good tracking, changes from the already done merges will pop out again in the subsequent merges and you might need to redecide the conflicts again. This is quite painful so I would suggest to large merges with full team so everyone can see what has been decided and then they can keep the correct code during their smaller merges.

The main point is that without proper merge tracking, your need for someone who understands the code to be present or doing the merge increases, because he needs to identify the correct (current) chunks of code to go into the file.

Jiri Klouda
A: 

One possible way of merging this various code bases is to use something called vendor branches in SVN for example.

Example:

A - oldest code branch B - second oldest code branch C - etc, etc

Import A Vendor Import B, fix Vendor Import C, fix etc

mako