views:

82

answers:

1

To protect ourselves from later patent claims etc. we feel the urge to track contributions being made to the core code of a future open source project.

We thought of the following procedure if we get contributions:

  • a contributor has to do a quick registration with us if he commits more than just small bugfixes to our main codebase (still being discussed internally; maybe we don't require registrations at all but tracking should be possible nevertheless)
    • registration includes preferred nickname, full name, (+ company if that applies), email and if possible another contact option (secondary email, phone, address, something like that in case the original email address is unreachable if needed)
    • registration details are kept secret - the only detail being visible to anybody not being on project lead is the usual commit
    • we know this is an uncommon and unconvenient measure that could scare some possible contributors and may lead to forks or no contributors at all; please ignore this issue on your answer and assert there would be contributors who are willing to provide us that data
    • project lead wants this no matter what implications it would have
  • we want to track these contributions over code lines or words and time to contact developers whose code is active if we should ever plan an incompatible license change or get any copyright or patent issues
  • SVN could work but we think a distributed SCM like Mercurial could be easier to handle
    • SVN could use some annotations in commit logs like @EXT-CONTRIB: nickname as well as the revision number logged in an internal database
      • we know we could also commit to SVN with their nicknames but since there is no IDE support for that it's likely we wouldn't do it this way
    • a distributed SCM would already include nicknames and maybe a history; we could use this and just register the merge to a contributor

We need to have a tool that analyzes all contributions and generates us a report containing as much data as possible of the following: (interactive if possible)

  • who wrote which parts of the code (lines/words)
  • when did we merge that contribution, how did we receive it?
  • when did we stop to use the contributed code
    • removed it from head revision but kept it in SCM
    • rewrote/refactored it so much that it's no longer relevant because the only remains of the original contribution are formatting or syntax/structure that has no meaning on itself (like if ( ) { } with other people's code inside the brackets)
  • ability to quickly see changes in contributions made to a file over time
  • visualize all that data (color it like a code coverage report)
  • how can we get in contact with the authors in case we need to
  • optional: statistics - overview about percentage of external contributions (people that never were in the project team and committed indirectly with patches sent in by mail or bugtracker)

Does such a tool already exist or would we need to write one ourselves? What SCMs are supported? If possible, the tool should be open source or at least free or inexpensive.

+2  A: 

This problem was one of the major considerations in the early design of git. The practice of signing-off commits and using strong authentication for commits in git make git the ideal VCS for your situation. As for keeping registration details secret, that is beyond the scope of the VCS; you might allow the nickname of the developer to remain public in the VCS, and maintain a private database of information tying the nickname to the developer.

William Pursell
And git is, compared to svn, lightning fast.
Dykam
Sorry but git only tracs that data, the question is about a tool that visualizes that data or am i wrong?
gsnerf
@gsnerf: hmmm...you may be right, but I think the first 4 bullets are easily addressed by standard git tools
William Pursell
Thats true, but I would read the first part as "thats what we want to do" and the second part as "thats what we want in addition". Git and mercurial both track all needed data, mercurial also has a nice graph feature for the branches. I don't know if this is already sufficient? Worth a try I'd say (on both systems)
gsnerf
+1. git alongside gitk, qgit, etc. looks like the best candidate here.
Andrew Y
Energiequant