tags:

views:

90

answers:

4

I have the following scenario:

* ab82147 (HEAD, topic) changes
* 8993636 changes
* 82f4426 changes
* 18be5a3 (master) first

I'd like to merge (non fast-forward) topic into master. This requires me to:

  • git checkout master
  • git merge --no-ff topic

But checking out master, and then merging topic into it causes git to change my working directory (although the final result is identical to the one before checking out master), and the problem I have with that is due to the size of our project, it takes about 30 minutes to build it (with IncrediBuild) although nothing really changed and it's simply unbearable.

So what I would like to get is the following:

*   9075cf4 (HEAD, master) Merge branch 'topic'
|\  
| * ab82147 (topic) changes
| * 8993636 changes
| * 82f4426 changes
|/  
* 18be5a3 first

Without really touching the working directory (or at least cheating git somehow).

+5  A: 

Interesting! I don't think there's a built-in way to do this, but you should be able to fudge it using the plumbing:

#!/bin/bash

branch=master
# or take an argument:
# if [ $@ eq 1 ];
#      branch="$1";
# fi

# make sure the branch exists
if ! git rev-parse --verify --quiet --heads "$branch" > /dev/null; then
     echo "error: branch $branch does not exist"
     exit 1
fi

# make sure this could be a fast-forward   
if [ "$(git merge-base HEAD $branch)" == "$(git rev-parse $branch)" ]; then
    # find the branch name associated with HEAD
    currentbranch=$(git symbolic-ref HEAD | sed 's@.*/@@')
    # make the commit
    newcommit=$(echo "Merge branch '$currentbranch'" | git commit-tree $(git log -n 1 --pretty=%T HEAD) -p $branch -p HEAD)
    # move the branch to point to the new commit
    git update-ref -m "merge $currentbranch: Merge made by simulated no-ff" "refs/heads/$branch" $newcommit
else
    echo "error: merging $currentbranch into $branch would not be a fast-forward"
    exit 1
fi

The interesting bit is that newcommit= line; it uses commit-tree to directly create the merge commit. The first argument is the tree to use; that's the tree HEAD, the branch whose contents you want to keep. The commit message is supplied on stdin, and the rest of the arguments name the parents the new commit should have. The commit's SHA1 is printed to stdout, so assuming the commit succeeded, you capture that, then merge that commit (that'll be a fast-forward). If you're obsessive, you could make sure that commit-tree succeeded - but that should be pretty much guaranteed.

Limitations:

  • This only works on merges that could have been a fast-forward. Obviously you'll actually have to check out and merge (possibly in a clone, to save your build system) in that case.
  • The reflog message is different. I did this deliberately, because when you use --no-ff, git will actually force itself to use the default (recursive) strategy, but to write that in the reflog would be a lie.
  • If you're in detached HEAD mode, things will go badly. That would have to be treated specially.

And yes, I tested this on a toy repo, and it appears to work properly! (Though I didn't try hard to break it.)

Jefromi
I'm impressed by this git voodoo! I just tried it on the repo I had in the question and the results were quite strange ;-) This is how the graph looked like afterwards: http://paste.lisp.org/+2FCI
Idan K
@Idan K: That looks like exactly what you wanted, except you have to check out master still, yes?
Jefromi
@Jefromi: look carefully at the `*` that should belong to topic. They're on the left line instead of the right one. A normal merge produces this graph: http://paste.lisp.org/+2FCS
Idan K
@Idan K: Oh, I see. Those are actually nearly identical - the history is the same, the drawing's just been rearranged, which means that the first and second parents of the merge commit were swapped. (The first parent should be the merged-into branch, not the merged branch.) The only thing this affects is actually referencing the parents of the merge commit (e.g. master^, git log --first-parent, ...). I've edited the answer to fix that - all you have to do is swap the two parent commits given to commit-tree.
Jefromi
@Jefromi: I was sure I tried that, damn ;-) I knew it was something silly as that. Many thanks, I'll give this a go for my next merges.
Idan K
@Idan K: There's also the possibility that the history viewer is choosing the order to show parents not from the actual parent order but based on what makes things fit nicely in the display, but I didn't think gitk did that...
Jefromi
Actually this is the output I got from `git log --graph`, switching the parents order fixed it. One small thing that is missing is the default commit message git uses when merging, it's not a big deal but if I could somehow get the the original message so it's consistent with 'regular' merges it would be ideal.
Idan K
@Idan K: I thought I duplicated the commit message - that's the `echo "Merge branch '$currentbranch'"` piped into commit-tree. Your pasted graph does say "Merge branch 'topic'" just like your requested result. What were you looking for?
Jefromi
From what I saw so far there are 3 different messages: (1) the one you wrote, (2) "Merge branch 'topic' into merged_to_branch" or (3) which is the same as (2) only it says "remote branch" instead of "branch" if 'topic' is a remote branch.
Idan K
@Idan K: Ah, I see. I don't think there's really a canned way to tell git "give me the message you *would* use if you performed this merge which I can't actually perform right now". If you want to get the 'remote' bit in there I'd just check if branch contains a slash (or if you want to be fancy, see if refs/remotes/$branch exists). I don't actually recall what causes it to tell you what branch you merged into...
Jefromi
Yeah, I guess that's asking abit too much, what you've given me is good enough, I can live without the standard commit message ;-) Thanks again.
Idan K
+1  A: 

The simplest way I can think of would be to git clone to a separate working copy, do the merge there, then git pull back. The pull will then be a fast forward and should only affect files which really have changed.

Of course, with such a large project making temporary clones isn't ideal, and needs a fair chunk of extra hard disk space. The time cost of the extra clone can be minimised (in the long term) by keeping your merging-copy around, as long as you don't need the disk space.

Disclaimer: I haven't verified that this works. I believe it should though (git doesn't version file timestamps)

John Bartholomew
Cloning on a local machine can use hardlinks or even a shared object directory. This will save a lot of space.
siride
Does it hard-link the actual working copy files, or just the repository objects? Also, is this true on Windows? (original question mentioned IncrediBuild, so I'm assuming Windows... probably msysGit)
John Bartholomew
@John Bartholomew: It definitely doesn't hard-link work tree files - what would be the point of having a clone if the entire thing were the same?
Jefromi
I don't know much about windows, but wikipedia does mention that hard links can be created in windows, and that NTFS can do symlinks - so the git-new-workdir script could be an option too. http://git.kernel.org/?p=git/git.git;a=blob;f=contrib/workdir/git-new-workdir;h=993cacf324b8595e5be583ff372b25353c7af95c;hb=HEAD
Jefromi
Actually this is what I currently do, the loss of disk space isn't a real issue (compared to the build time). But I was looking for a more 'elegant' solution that can work in-place in the current repository.
Idan K
A: 

Alternatively, you can fix the symptoms directly by saving and restoring file timestamps. This is kinda ugly, but it was interesting to write.

Python Timestamp Save/Restore Script

#!/usr/bin/env python

from optparse import OptionParser
import os
import subprocess
import cPickle as pickle

try:
    check_output = subprocess.check_output
except AttributeError:
    # check_output was added in Python 2.7, so it's not always available
    def check_output(*args, **kwargs):
        kwargs['stdout'] = subprocess.PIPE
        proc = subprocess.Popen(*args, **kwargs)
        output = proc.stdout.read()
        retcode = proc.wait()
        if retcode != 0:
            cmd = kwargs.get('args')
            if cmd is None:
                cmd = args[0]
            err = subprocess.CalledProcessError(retcode, cmd)
            err.output = output
            raise err
        else:
            return output

def git_cmd(*args):
    return check_output(['git'] + list(args), stderr=subprocess.STDOUT)

def walk_git_tree(rev):
    """ Generates (sha1,path) pairs for all blobs (files) listed by git ls-tree. """
    tree = git_cmd('ls-tree', '-r', '-z', rev).rstrip('\0')
    for entry in tree.split('\0'):
        print entry
        mode, type, sha1, path = entry.split()
        if type == 'blob':
            yield (sha1, path)
        else:
            print 'WARNING: Tree contains a non-blob.'

def collect_timestamps(rev):
    timestamps = {}
    for sha1, path in walk_git_tree(rev):
        s = os.lstat(path)
        timestamps[path] = (sha1, s.st_mtime, s.st_atime)
        print sha1, s.st_mtime, s.st_atime, path
    return timestamps

def restore_timestamps(timestamps):
    for path, v in timestamps.items():
        if os.path.isfile(path):
            sha1, mtime, atime = v
            new_sha1 = git_cmd('hash-object', '--', path).strip()
            if sha1 == new_sha1:
                print 'Restoring', path
                os.utime(path, (atime, mtime))
            else:
                print path, 'has changed (not restoring)'
        elif os.path.exists(path):
            print 'WARNING: File is no longer a file...'

def main():
    oparse = OptionParser()
    oparse.add_option('--save',
        action='store_const', const='save', dest='action',
        help='Save the timestamps of all git tracked files')
    oparse.add_option('--restore',
        action='store_const', const='restore', dest='action',
        help='Restore the timestamps of git tracked files whose sha1 hashes have not changed')
    oparse.add_option('--db',
        action='store', dest='database',
        help='Specify the path to the data file to restore/save from/to')

    opts, args = oparse.parse_args()
    if opts.action is None:
        oparse.error('an action (--save or --restore) must be specified')

    if opts.database is None:
        repo = git_cmd('rev-parse', '--git-dir').strip()
        dbpath = os.path.join(repo, 'TIMESTAMPS')
        print 'Using default database:', dbpath
    else:
        dbpath = opts.database

    rev = git_cmd('rev-parse', 'HEAD').strip()
    print 'Working against rev', rev

    if opts.action == 'save':
        timestamps = collect_timestamps(rev)
        data = (rev, timestamps)
        pickle.dump(data, open(dbpath, 'wb'))
    elif opts.action == 'restore':
        rev, timestamps = pickle.load(open(dbpath, 'rb'))
        restore_timestamps(timestamps)

if __name__ == '__main__':
    main()

Bash Test Script

#!/bin/bash

if [ -d working ]; then
    echo "Cowardly refusing to mangle an existing 'working' dir."
    exit 1
fi

mkdir working
cd working

# create the repository/working copy
git init

# add a couple of files
echo "File added in master:r1." > file-1
echo "File added in master:r1." > file-2
mkdir dir
echo "File added in master:r1." > dir/file-3
git add file-1 file-2 dir/file-3
git commit -m "r1: add-1, add-2, add-3"
git tag r1
# sleep to ensure new or changed files won't have the same timestamp
echo "Listing at r1"
ls --full-time
sleep 5

# make a change
echo "File changed in master:r2." > file-2
echo "File changed in master:r2." > dir/file-3
echo "File added in master:r2." > file-4
git add file-2 dir/file-3 file-4
git commit -m "r2: change-2, change-3, add-4"
git tag r2
# sleep to ensure new or changed files won't have the same timestamp
echo "Listing at r2"
ls --full-time
sleep 5

# create a topic branch from r1 and make some changes
git checkout -b topic r1
echo "File changed in topic:r3." > file-2
echo "File changed in topic:r3." > dir/file-3
echo "File added in topic:r3." > file-5
git add file-2 dir/file-3 file-5
git commit -m "r3: change-2, change-3, add-5"
git tag r3
# sleep to ensure new or changed files won't have the same timestamp
echo "Listing at r3"
ls --full-time
sleep 5

echo "Saving timestamps"
../save-timestamps.py --save

echo "Checking out master and merging"
# merge branch 'topic'
git checkout master
git merge topic
echo "File changed in topic:r3." > file-2 # restore file-2
echo "File merged in master:r4." > dir/file-3
git add file-2 dir/file-3
git commit -m "r4: Merge branch 'topic'"
git tag r4
echo "Listing at r4"
ls --full-time

echo "Restoring timestamps"
../save-timestamps.py --restore
ls --full-time

I'll leave it as an exercise for the reader to clean up the Python script to remove extraneous output and add better error checking.

John Bartholomew
A: 

Here's sort of a cheating version.

  1. git stash
  2. git tag tmptag
  3. git merge --no-ff topic
  4. git checkout tmptag (-b tha_brunch)?
  5. git stash pop
  6. git tag -D tmptag
hlynur