views:

2274

answers:

2

Given the hash of a blob, is there a way to get a list of commits that have this blob in their tree?

+1  A: 

I thought this would be a generally useful thing to have, so I wrote up a little perl script to do it:

#!/usr/bin/perl -w

use strict;

my @commits;
my %trees;
my $blob;

sub blob_in_tree {
    my $tree = $_[0];
    if (defined $trees{$tree}) {
        return $trees{$tree};
    }
    my $r = 0;
    open(my $f, "git cat-file -p $tree|") or die $!;
    while (<$f>) {
        if (/^\d+ blob (\w+)/ && $1 eq $blob) {
            $r = 1;
        } elsif (/^\d+ tree (\w+)/) {
            $r = blob_in_tree($1);
        }
        last if $r;
    }
    close($f);
    $trees{$tree} = $r;
    return $r;
}

sub handle_commit {
    my $commit = $_[0];
    open(my $f, "git cat-file commit $commit|") or die $!;
    my $tree = <$f>;
    die unless $tree =~ /^tree (\w+)$/;
    if (blob_in_tree($1)) {
        print "$commit\n";
    }
    while (1) {
        my $parent = <$f>;
        last unless $parent =~ /^parent (\w+)$/;
        push @commits, $1;
    }
    close($f);
}

if (!@ARGV) {
    print STDERR "Usage: git-find-blob blob [head ...]\n";
    exit 1;
}

$blob = $ARGV[0];
if (@ARGV > 1) {
    foreach (@ARGV) {
        handle_commit($_);
    }
} else {
    handle_commit("HEAD");
}
while (@commits) {
    handle_commit(pop @commits);
}

I'll put this up on github when I get home this evening.

Update: It looks like somebody already did this. That one uses the same general idea but the details are different and the implementation is much shorter. I don't know which would be faster but performance is probably not a concern here!

Update 2: For what it's worth, my implementation is orders of magnitude faster, especially for a large repository. That git ls-tree -r really hurts.

Update 3: I should note that my performance comments above apply to the implementation I linked above in the first Update. Aristotle's implementation performs comparably to mine. More details in the comments for those who are curious.

Greg Hewgill
Hmm, how can it be *that* much faster? You’re walking the tree anyway, aren’t you? What work does git-ls-tree do that you avoid? (NB.: grep will bail on first match, SIGPIPE’ing the git-ls-tree.) When I tried it, I had to Ctrl-C your script after 30 seconds; mine was done in 4.
Aristotle Pagaltzis
My script caches the results of subtrees in the %trees hash, so it doesn't have to keep searching subtrees that haven't changed.
Greg Hewgill
Actually, I was trying the implementation I found on github that I linked to. Yours is faster in some cases, but it depends highly on whether the file you're looking for is at the beginning or end of the ls-tree list. My repository has 9574 files in it right now.
Greg Hewgill
It also occurs to me that some nonlinear project histories might cause my script to do much more work than it needs to (this can be fixed). This might be why it took a long time to run for you. My respository is a git-svn mirror of a Subversion repository, so it's nicely linear.
Greg Hewgill
+13  A: 

Here it is as a shell script:

#!/bin/sh
obj_name="$1"
shift
git log "$@" --pretty=format:'%T %h %s' \
| while read tree commit subject ; do
    if git ls-tree -r $tree | grep -q "$obj_name" ; then
        echo $commit "$subject"
    fi
done

You pass the blob SHA1 as the first parameter and then any number of arguments to git log.


Update: and here is an optimised version in Perl, still quite short:

#!/usr/bin/perl
use 5.008;
use strict;
use Memoize;

die "usage: git-find-blob <blob> [<git-log arguments ...>]\n"
    if not @ARGV;

my $obj_name = shift;

sub check_tree {
    my ( $tree ) = @_;
    my @subtree;

    {
        open my $ls_tree, '-|', git => 'ls-tree' => $tree
            or die "Couldn't open pipe to git-ls-tree: $!\n";

        while ( <$ls_tree> ) {
            /\A[0-7]{6} (\S+) (\S+)/
                or die "unexpected git-ls-tree output";
            return 1 if $2 eq $obj_name;
            push @subtree, $2 if $1 eq 'tree';
        }
    }

    check_tree( $_ ) && return 1 for @subtree;

    return;
}

memoize 'check_tree';

open my $log, '-|', git => log => @ARGV, '--pretty=format:%T %h %s'
    or die "Couldn't open pipe to git-log: $!\n";

while ( <$log> ) {
    chomp;
    my ( $tree, $commit, $subject ) = split " ", $_, 3;
    print "$commit $subject\n" if check_tree( $tree );
}
Aristotle Pagaltzis
Nice implementation, very Perlic. :) +1
Greg Hewgill