views:

178

answers:

4

I would like to know on a line by line basis, what percentage of source code within a subversion repository has been modified between two commits.

For example. say revision 2100 has 150,000 lines of code -- but revision 2600 has 165,000 lines of code where 8,000 lines of the original 150,000 code where modified. I would report this as 142,000 / 165,000 = 86% the same, 14% "new". I don't care to separate Javadoc, XML, comments, or unit tests....just lump them all together as "source".

Any idea how to do this?

+2  A: 

I don't know of a simple way to do it, but you could parse diffs generated from the source files. By using the -r option to diff you can specify what range you want, i.e. 2100:2600. Using a unified diff, the number of changed (deleted or edited) lines would likely be the number of lines in your diff starting with '-'. Comparing them to total lines should get you close to your answer. This is of course only an approximation. If you have worked with patches for a while you'll know that simply reordering some lines will generate a diff showing more lines changed than you might expect, so take the number returned with a grain of salt.

Wikipedia has a short description of the formats. Default for svn should be a unified diff.

wds
+1  A: 

You could use diff as wds says, and do some bash / awk / sed scripting to obtain different reports. I don't know of a specific tool for this, but coding it would be a simple and fun task.

Fernando
+1  A: 

SVN has no built-in tools to do this. There are commercial tools like Fisheye which can generate various reports of repository activity, as well as provide browsing/searching capabilities. You could look at this and see if it meets your needs.

The other option would be to use "svn log" and "svn diff", combined with some scripting to tell you what you are looking for.

msemack
+2  A: 

The metric you are looking for(I believe) is code churn.

There is a previous SO question on that!

Paul Nathan
Indeed what I am looking for is a specific kind of code churn. The previous SO question references StatSVN which reports code churn on a daily basis in a graph, but doesn't seem to show aggregate code churn from one revision level to another...unless it has an API I can't find. Very helpful answer however...and StatSVN looks very cool.
HDave
The dev of StatSVN is pretty helpful. I've interacted with him a few times. Also - I bet you can derive the algorithm from the statsvn code and write some Perl to make it happen for you.
Paul Nathan