views:

417

answers:

1

I'm looking for an efficient way to get the list of unique commit authors for an SVN repository as a whole, or for a given resource path. I haven't been able to find an SVN command specifically for this (and don't expect one) but I'm hoping there may be a better way that what I've tried so far in Terminal (on OS X):

svn log --quiet | grep "^r" | awk '{print $3}'

svn log --quiet --xml | grep author | sed -E "s:</?author>::g"

Either of these will give me one author name per line, but they both require filtering out a fair amount of extra information. They also don't handle duplicates of the same author name, so for lots of commits by few authors, there's tons of redundancy flowing over the wire. More often than not I just want to see the unique author usernames. (It actually might be handy to infer the commit count for each author on occasion, but even in these cases it would be better if the aggregated data were sent across instead.)

I'm generally working with client-only access, so svnadmin commands are less useful, but if necessary, I might be able to ask a special favor of the repository admin if strictly necessary or much more efficient. The repositories I'm working with have tens of thousands of commits and many active users, and I don't want to inconvenience anyone.

+3  A: 

To filter out duplicates, take your output and pipe through: sort | uniq. Thus:

svn log --quiet | grep "^r" | awk '{print $3}' | sort | uniq

I woud not be surprised if this is the way to do what you ask. Unix tools often expect the user to do fancy processing and analysis with other tools.

P.S. Come to think of it, you can merge the grep and awk...

svn log --quiet | awk '/^r/ {print $3}' | sort | uniq

P.P.S. Per Kevin Reid...

svn log --quiet | awk '/^r/ {print $3}' | sort -u

For more efficient, you could do a Perl one-liner. I don't know Perl that well, so I'd wind up doing it in Python:

#!/usr/bin/env python
import sys
authors = set()
for line in sys.stdin:
    if line[0] == 'r':
        authors.add(line.split()[2])
for author in sorted(authors):
    print author

Or, if you wanted counts:

#!/usr/bin/env python
import sys
authors = {}
for line in sys.stdin:
    if line[0] != 'r':
        continue
    author = line.split()[2]
    if author not in authors:
        authors[author] = 0
    authors[author] += 1
for author in sorted(authors):
    print author, authors[author]

Then you'd run:

svn log --quiet | ./authorfilter.py
Mike DeSimone
+1 for the useful suggestion. I was aware of `sort` but not `uniq`, and it seems the latter takes a `-c` parameter than prepends the number of occurrences for each line. I'm still hoping for a more efficient (and scalable) way, but this does the trick in a pinch.
Quinn Taylor
`sort -u` does `sort | uniq` in one command.
Kevin Reid
By the way, if you have XPath handy, then the query `//author/text()` will get just the author names out of `svn log --xml` robustly. (Mac OS X has an `xpath` command which *almost* does this job, but produces extraneous text and can't be configured not to. Maybe there's something else.)
Kevin Reid
@Kevin, you should add your own answer so people can vote for you. I like all your comments, particularly the sort/uniq tip.
Quinn Taylor