tags:

views:

83

answers:

2

So I have a bit of an issue. I work for a small startup (about 8 developers) and my boss recently decided that we need to put the owner of each file in the documentation. So I have been try to write something using svn blame and file to loop through every php file and see which files have my username on more that 15 lines, but I haven't been able to get it quite right.

What I would really like is a one-liner (or simple bash script) that will list every file in a subversion repository and the username that last edited the majority of the lines. Any ideas?

+2  A: 

Alright, this is what I came up with:

#!/bin/bash

set -e

for file in `svn ls -R`; do
  if [ -f $file ]; then
    owner=`svn blame $file | tr -s " " " " | cut -d" " -f3 | sort | uniq -c | sort -nr | head -1 | tr -s " " " " | cut -d" " -f3`
    if [ $owner ]; then
      echo $file $owner
    fi
  fi
done

It uses svn ls to determine each file in the repository, then for each file, svn blame output is examined:

  • tr -s " " " " squeezes multiple spaces into one space
  • cut -d" " -f3 gets the third space-delimited field, which is the username
  • sort sorts the output so all lines last edited by one user are together
  • uniq -c gets all unique lines and outputs the count of how many times each line appeared
  • sort -nr sorts numerically, in reverse order (so that the username that appeared most is sorted first)
  • head -1 returns the first line
  • tr -s " " " " | cut -d" " -f3 same as before, squeezes spaces and returns the third fieldname which is user.

It'll take a while to run but at the end you'll have a list of <filename> <most prevalent author>

Caveats:

  • Error checking is not done to make sure the script is called from within an SVN working copy
  • If called from deeper than the root of a WC, only files at that level and deeper will be considered
  • As mentioned in the comments, you might want to take revision date into account (if the majority of checkins happened 10 years ago, you might want to discount them determining the owner)
  • Any working copy changes that aren't checked in won't be taken into effect
Daniel Vandersluis
I was about to post a similar solution, but I think yours is better :) The only other limitation I see is that it doesn't take into account who edited the file most recently, which is part of what the OP is looking for. (e.g. if alice created the file in 1998 and touched most of the lines, but bob edited 1/3 of the file yesterday).
John Ledbetter
I didn't want to overly complicate things, but that's a good point. To get around it, you could determine the earliest revision of each file within a given span (say 5 years) and then use `-r <rev>:HEAD` on the `svn blame` call but that would slow everything down further.
Daniel Vandersluis
+2  A: 
for f in $(find . -name .svn -prune -o -type f); do
   echo $f $(svn blame $f | awk '{ print $2 }' | sort | uniq -c | sort -nr | head -n 1  | cut -f 1)
done
unhillbilly
Pretty much the same as the core of my script, but the questioner asked for a way to see the owner of all files in the repo.
Daniel Vandersluis
You're right. I like the your use of 'svn ls'.
unhillbilly