views:

316

answers:

7

I have a "fresh" git-svn repo (11.13 GB) that has over a 100,000 objects in it.

I have preformed

git fsck
git gc

on the repo after the initial checkout.

I then tried to do a

git status

The time it takes to do a git status is anywhere from 2m25.578s and 2m53.901s

I tested git status by issuing the command

time git status

5 times and all of the times ran between the two times listed above.

I am doing this on a Mac OS X, locally not through a VM.

There is no way it should be taking this long.

Any ideas? Help?

Thanks.

Edit

I have a co-worker sitting right next to me with a comparable box. Less RAM and running Debian with a jfs filesystem. His git status runs in .3 on the same repo (it is also a git-svn checkout).

Also, I recently changed my file permissions (to 777) on this folder and it brought the time down considerably (why, I have no clue). I can now get it done anywhere between 3 and 6 seconds. This is manageable, but still a pain.

+3  A: 

You could try passing the --aggressive switch to git gc and see if that helps:

# this will take a while ...
git gc --aggressive

Also, you could use git filter-branch to delete old commits and/or files if you have things which you don't need in your history (e.g., old binary files).

David Underhill
Trying git gc --aggressive. Your right this is going to take awhile.
mediaslave
git gc --aggressive cut the time down to 58s to 1m. Still very long...
mediaslave
git filter-branch will not work for me. There is no history that I can loose.
mediaslave
Going to try this myself +1
masonk
A: 

Maybe you use a virus scanner? I've tested some big projects here on Windows and on Linux - it was damn fast!

I don't think that you need to do a git gc in a cloned repo (it should be clean).

Is your harddrive OK? IOPS and R/W per second? Maybe it is damaged?

Andreas Rehm
Check the SMART status using Disk Utilities.
gdelfino
S.M.A.R.T Status: Verified
mediaslave
git-svn leaves you with loose objects, so it is necessary to gc/repack.
jleedev
I did gc (not aggressive) and repack. Same effect. Trying gc aggressive.
mediaslave
+1  A: 

You also might try git repack

docgnome
Nothing new to pack. is what git repack returned. Thanks
mediaslave
A: 

maybe spotlight is trying to index the files. Perhaps disable spotlight for your code dir. Check Activity Monitor and see what processes are running.

neoneye
Well, that is good, but my hard drive has not activity when I am not running git status. I will try this, but I don't think it is of relevance. Thanks.
mediaslave
Turned off indexing for that directory. This has made no difference. Thanks.
mediaslave
A: 

I'd create a partition using a different file system. HFT+ has always been sluggish for me compared to doing similar operations on other file systems.

srparish
I am transferring it to an ext2 partition. I will let you know if it fixes it.
mediaslave
This does not seem to make that big of a difference. 10 second time gain still shooting to 45 seconds or so.
mediaslave
+2  A: 

git status has to look at every file in the repository every time. You can tell it to stop looking at trees that you aren't working on with

git update-index --assume-unchanged <trees to skip>

source

From the manpage:

When these flags are specified, the object names recorded for the paths are not updated. Instead, these options set and unset the "assume unchanged" bit for the paths. When the "assume unchanged" bit is on, git stops checking the working tree files for possible modifications, so you need to manually unset the bit to tell git when you change the working tree file. This is sometimes helpful when working with a big project on a filesystem that has very slow lstat(2) system call (e.g. cifs).

This option can be also used as a coarse file-level mechanism to ignore uncommitted changes in tracked files (akin to what .gitignore does for untracked files). Git will fail (gracefully) in case it needs to modify this file in the index e.g. when merging in a commit; thus, in case the assumed-untracked file is changed upstream, you will need to handle the situation manually.

Many operations in git depend on your filesystem to have an efficient lstat(2) implementation, so that st_mtime information for working tree files can be cheaply checked to see if the file contents have changed from the version recorded in the index file. Unfortunately, some filesystems have inefficient lstat(2). If your filesystem is one of them, you can set "assume unchanged" bit to paths you have not changed to cause git not to do this check. Note that setting this bit on a path does not mean git will check the contents of the file to see if it has changed — it makes git to omit any checking and assume it has not changed. When you make changes to working tree files, you have to explicitly tell git about it by dropping "assume unchanged" bit, either before or after you modify them.

...

In order to set "assume unchanged" bit, use --assume-unchanged option. To unset, use --no-assume-unchanged.

The command looks at core.ignorestat configuration variable. When this is true, paths updated with git update-index paths… and paths updated with other git commands that update both index and working tree (e.g. git apply --index, git checkout-index -u, and git read-tree -u) are automatically marked as "assume unchanged". Note that "assume unchanged" bit is not set if git update-index --refresh finds the working tree file matches the index (use git update-index --really-refresh if you want to mark them as "assume unchanged").


Now, clearly, this solution is only going to work if there are parts of the repo that you can conveniently ignore. I work on a project of similar size, and there are definitely large trees that I don't need to check on a regular basis. The semantics of git-status make it a generally O(n) problem (n in number of files). You need domain specific optimizations to do better than that.

Note that if you work in a stitching pattern, that is, if you integrate changes from upstream by merge instead of rebase, then this solution becomes less convenient, because a change to an --assume-unchanged object merging in from upstream becomes a merge conflict. You can avoid this problem with a rebasing workflow.

masonk
It does not appear that you can do this for whole folders. You have to add files individually. This would not work if that is the case.
mediaslave
I fail to see why not
masonk
A: 

It came down to a couple of items that I can see right now.

  1. git gc --aggressive
  2. Opening up file permissions to 777

There has to be something else going on, but this was the things that clearly made the biggest impact.

mediaslave