views:

368

answers:

2

Apparently, when you move a function from one source code file to another, the git revision log (for the new file) can show you where that code fragment was originally coming from (see for example the Viewing History section in this tutorial).

How does this work?

+2  A: 

It's purely a heuristic. It compares the distance between files and tries to find matching blocks. But this heuristic is only implemented when the code is copied or moved to a new file (otherwise it would be too costly, checking every pair of files).

tonfa
and only within the same commit?
Thilo
Yes, it looks for matching files withing the same commit.
tonfa
See documentation for `-M` and `-C` options in **git diff** manpage.
Jakub Narębski
+14  A: 

It doesn't track them. That's the beauty of it.

Git only records snapshots of the entire project tree: here's what all files looked like before the commit and here's how they look like after. How we got from here to there, Git doesn't care.

This allows intelligent tools to be written after a commit has already happened, to extract information from that commit. For example, rename detection in Git is done by comparing all deleted files against all new files and comparing pairwise similarity metrics. If the similarity metric is greater than x, they are considered renamed, if it is between y and x (y < x), it is considered to be a rename+edit, and if it is below y, they are considered independent. The cool thing is that you, as a "commit archaeologist", can specify after the fact, what x and y should be. This would not work if the commit simply recorded "this file is a rename of that file".

Detecting moved content works similar: you slice every file into pieces, compute similarity metrics between all the slices and can then deduce that this slice which was deleted over here and this very similar slice which was added over there are actually the same slice that was moved from here to there.

However, as tonfa mentioned in his answer, this is very expensive, so it is not normally done. But it could be done, and that's the point.

BTW: this is pretty much the exact opposite of the Operational Transformation model used by Google Wave, EtherPad, Gobby, SubEthaEdit, ACE and Co.

Jörg W Mittag