views:

1744

answers:

4

Basically I want to get the number of lines-of-code in the repository after each commit.

The only (really crappy) ways I have found is to use git filter-branch to run "wc -l *", and a script that run git reset --hard on each commit, then ran wc -l

To make it a bit clearer, when the tool is run, it would output the lines of code of the very first commit, then the second and so on.. This is what I want the tool to output (as an example):

me@something:~/$ gitsloc --branch master
10
48
153
450
1734
1542

I've played around with the ruby 'git' library, but the closest I found was using the .lines() method on a diff, which seems like it should give the added lines (but does not.. it returns 0 when you delete lines for example)

require 'rubygems'
require 'git'

total = 0
g = Git.open(working_dir = '/Users/dbr/Desktop/code_projects/tvdb_api')


last = nil
g.log.each do |cur|
  diff = g.diff(last, cur)
  total = total + diff.lines
  puts total
  last = cur
end
+2  A: 

The first thing that jumps to mind is the possibility of your git history having a nonlinear history. You might have difficulty determining a sensible sequence of commits.

Having said that, it seems like you could keep a log of commit ids and the corresponding lines of code in that commit. In a post-commit hook, starting from the HEAD revision, work backwards (branching to multiple parents if necessary) until all paths reach a commit that you've already seen before. That should give you the total lines of code for each commit id.

Does that help any? I have a feeling that I've misunderstood something about your question.

Greg Hewgill
+9  A: 

You may get both added and removed lines with git log, like:

git log --shortstat --reverse --pretty=oneline

From this, you can write a similar script to the one you did using this info. In python:

#!/usr/bin/python

"""
Display the per-commit size of the current git branch.
"""

import subprocess
import re
import sys

def main(argv):
  git = subprocess.Popen(["git", "log", "--shortstat", "--reverse",
                        "--pretty=oneline"], stdout=subprocess.PIPE)
  out, err = git.communicate()
  total_files, total_insertions, total_deletions = 0, 0, 0
  for line in out.split('\n'):
    if not line: continue
    if line[0] != ' ': 
      # This is a description line
      hash, desc = line.split(" ", 1)
    else:
      # This is a stat line
      data = re.findall(
        ' (\d+) files changed, (\d+) insertions\(\+\), (\d+) deletions\(-\)', 
        line)
      files, insertions, deletions = ( int(x) for x in data[0] )
      total_files += files
      total_insertions += insertions
      total_deletions += deletions
      print "%s: %d files, %d lines" % (hash, total_files,
                                        total_insertions - total_deletions)


if __name__ == '__main__':
  sys.exit(main(sys.argv))
fserb
`err` will be always `None` in your code.
J.F. Sebastian
`if not line.strip(): continue` might be more robust.
J.F. Sebastian
`argv` is not used in `main()`
J.F. Sebastian
(copied from old answer) That's perfect! I was intending to write it in Python, but I happened to have the ruby-git library installed, so attempted to do it using that. Thanks!With a few small changes to the print statement, I could save the output to a .csv file and shove it into Google Docs/Spreadsheet, to generate a graph!It's not completely perfect, since it counts comments and docstrings as code, and I've no idea how it'll handle binary files.. but, as a script I can run on any repository, without complicated post-commit hooks and such, it's great!
dbr
+6  A: 

You might also consider gitstats, which generates this graph as an html file.

Carl
+1  A: 

http://github.com/ITikhonov/git-loc worked right out of the box for me.

MattDiPasquale