views:

120

answers:

1

There had been a lot of discussions about the core.autocrlf and core.safecrlf features in the current release and the next release. The question i have here relates to an environment where developers clone from a bare repository.

During the clone the autocrlf settings are enabled. But since the developers has full control on their clone, they can remove this autocrlf setting and proceed.

  1. We can specify files other than binary in the .gitattributes file but is there any other way GIT automatically determine if a file is a text file or binary file?

  2. Is there a way like an update hook (commit hook is not possible as developers can still remove it) that can be placed to make sure, the files (with CRLF) being pushed from a windows environment to a UNIX machine hosting the bare repo, is converted to UNIX EOL format (LF)?

  3. Will having such update hooks that scans each file for CRLF affect performance of a push operation?

Thanks

+1  A: 
  • 1/ Git itself has an heuristic to determine if a file is binary or text (similar to istext)

  • 2/ gergap weblog had recently (may 2010) the same idea.
    See his update hook here (reproduced at the end of this answer), but the trick is:
    Rather than trying to convert, the hook will simply reject the push if it detects an (supposedly) non-binary file with improper eol style.

Git converts LF->CRLF when checking out on Windows.
If the file contains already CRLF, Git is clever enough to detect that and does not expand it to CRCRLF what would be wrong. It keeps the CRLF, which means the file was implicitly changed locally during the checkout, because when committing it again, the wrong CRLF will be corrected to LF. That’s why GIT must mark these files as modified.

It’s good to understand the problem, but we need a solution that prevents that wrong line endi- ngs are pushed to the central repo.
The solution is to install an update hook on the central server.

  • 3/ There will be a small cost, but unless you push every 30 seconds, this shouldn't be an issue.
    Plus there is no actual conversion taking place: it the file is not correct, the push gets rejected.
    That places the conversion issue right back where it should belong: on the developer side.

#!/bin/sh
#
# Author: Gerhard Gappmeier, ascolab GmbH
# This script is based on the update.sample in git/contrib/hooks.
# You are free to use this script for whatever you want.
#
# To enable this hook, rename this file to "update".
#

# --- Command line
refname="$1"
oldrev="$2"
newrev="$3"
#echo "COMMANDLINE: $*"

# --- Safety check
if [ -z "$GIT_DIR" ]; then
    echo "Don't run this script from the command line." >&2
    echo " (if you want, you could supply GIT_DIR then run" >&2
    echo "  $0 <ref> <oldrev> <newrev>)" >&2
    exit 1
fi

if [ -z "$refname" -o -z "$oldrev" -o -z "$newrev" ]; then
    echo "Usage: $0 <ref> <oldrev> <newrev>" >&2
    exit 1
fi

BINARAY_EXT="pdb dll exe png gif jpg"

# returns 1 if the given filename is a binary file
function IsBinary() 
{
    result=0
    for ext in $BINARAY_EXT; do
        if [ "$ext" = "${1#*.}" ]; then
            result=1
            break
        fi
    done

    return $result
}

# make temp paths
tmp=$(mktemp /tmp/git.update.XXXXXX)
log=$(mktemp /tmp/git.update.log.XXXXXX)    
tree=$(mktemp /tmp/git.diff-tree.XXXXXX)
ret=0

git diff-tree -r "$oldrev" "$newrev" > $tree
#echo
#echo diff-tree:
#cat $tree

# read $tree using the file descriptors
exec 3<&0
exec 0<$tree
while read old_mode new_mode old_sha1 new_sha1 status name
do
    # debug output
    #echo "old_mode=$old_mode new_mode=$new_mode old_sha1=$old_sha1 new_sha1=$new_sha1 status=$status name=$name"
    # skip lines showing parent commit
    test -z "$new_sha1" && continue
    # skip deletions
    [ "$new_sha1" = "0000000000000000000000000000000000000000" ] && continue

    # don't do a CRLF check for binary files
    IsBinary $tmp
    if [ $? -eq 1 ]; then
        continue # skip binary files
    fi

    # check for CRLF
    git cat-file blob $new_sha1 > $tmp
    RESULT=`grep -Pl '\r\n' $tmp`
    echo $RESULT
    if [ "$RESULT" = "$tmp" ]; then
        echo "###################################################################################################"
        echo "# '$name' contains CRLF! Dear Windows developer, please activate the GIT core.autocrlf feature,"
        echo "# or change the line endings to LF before trying to push."
        echo "# Use 'git config core.autocrlf true' to activate CRLF conversion."
        echo "# OR use 'git reset HEAD~1' to undo your last commit and fix the line endings."
        echo "###################################################################################################"
        ret=1
    fi
done
exec 0<&3
# --- Finished
exit $ret
VonC
Thanks Von, this update hook dos helps a lot. So this script skips binary files or does it try to read binary files too?I still have a doubt on how GIT distinguishes between a binary and text file.Do we have to specify which files have LF in the .gitattributes file?, (like ur suggestion in this article http://stackoverflow.com/questions/2517190/how-do-i-force-git-to-use-lf-instead-of-crlf-under-windows)or GIT has other mechanisms to distinguish files?
Senthil A Kumar
@Senthil: it is best to specify what is binary and what is not, otherwise Git use a [simple heuristic](http://github.com/qertoip/istext)
VonC
Thanks a ton :)
Senthil A Kumar
does GIT has the ability to find CRLFs(or git attribute) during the time of commit?Can this trigger be used as a pre-commit hook instead of an update hook?
Senthil A Kumar
@Senthil: that would be interesting to try, but with some modifications, since in pre-commit, the `git diff-tree` might not have the same information it has on an update (where everything is already committed)
VonC