ansaurus

Question

git is very very slow

Answer 1

+1 A:

Just set the files up to be ignored. See the link below:

http://help.github.com/git-ignore/

joshlrogers 2010-06-16 17:03:12

They're already added. Ignoring them won't do anything.

Jefromi 2010-06-16 19:00:37

@Jefromi actually if you look at the link I posted you'll see that there is instructions in the second paragraph telling him exactly what to do in that case.

joshlrogers 2010-06-16 20:05:40

@joshlrogers: True. But the direct content of your answer is "ignore the files", not "remove the files from tracking then ignore them". It's generally better to write it here than to link to another site.

Jefromi 2010-06-16 20:30:23

Answer 2

+1 A:

have you told git those files are binary?

e.g. added *.ext binary to your repository's .gitattributes

scomar 2010-06-16 17:05:16

I assume that telling git that files are binary speeds thing up.

Nadal 2010-06-16 17:45:16

it might if git's heuristics can't tell a file is binary automatically.

scomar 2010-06-16 18:13:32

Answer 3

+5 A:

Do you garbage collect?

git gc

This makes a significant difference in speed, even for small repos.

kubi 2010-06-16 17:13:22

This is done automatically when there gets to be too much clutter. I doubt it'll really help the OP.

Jefromi 2010-06-16 19:01:26

@Jefromi, is that new? I just upgraded to 1.7.1 yesterday, but previous to that the version I was using definitely did not automatically run `gc`.

kubi 2010-06-16 20:46:06

@kubi: Well, it hasn't been around forever, but it's not exactly new - it's been invoked from commit, merge, am, and rebase since caf9de2 (Sep 14 2007), or in stable version v1.5.4 (Feb 1 2008).

Jefromi 2010-06-17 13:24:31

if it's called during commit, then what's the point in ever running it manually? Have I been imagining the speed increases I think I've seen, or is the auto-invoke somehow different that the manually invoked call?

kubi 2010-06-17 13:40:47

On second thought, `git gc` can't possibly be called on `commit` and `merge`, otherwise `git fsck --unreachable` would never return anything.

kubi 2010-06-17 13:51:40

Found it. The default number of loose objects before the auto `gc` runs is 6700, which explains why I've never seen it run.

kubi 2010-06-17 14:01:27

Answer 4

+4 A:

Explanation

Git is really good at huge histories of small text files because it can store them and their changes efficiently. At the same time, git is very bad at binary files, and will naïvely store separate copies of the file (by default, at least). The repository gets huge, and then it gets slow, as you've observed.

This is a common problem among DVCS's, exacerbated by the fact that you download every version of every file ("the whole repository") every time you clone. The guys at Kiln are working on a plugin to treat these large files more like Subversion, which only downloads historical versions on-demand.

Solution

This command will list all files under the current directory of size >= 5MB.

find . -size +5000000c 2>/dev/null -exec ls -l {} \;

If you want to remove the files from the entire history of the repository, you can use this idea with git filter-branch to walk the history and get rid of all traces of large files. After doing this, all new clones of the repository will be leaner. If you want to lean-up a repository without cloning, you'll find directions on the man page (see "Checklist for Shrinking a Repository").

git filter-branch --index-filter \
    'find . -size +5000000c 2>/dev/null -exec git rm --cached --ignore-unmatch {} \;'

A word of warning: this will make your repository incompatible with other clones, because the trees and indices have different files checked in; you won't be able to push or pull from them anymore.

Andres Jaan Tack 2010-06-16 17:29:38

Note: that's the Unix/Linux version of find, not the Windows find.exe.

Craig Trader 2010-06-16 17:31:47

+1. Might want to send the output of `find` to a file first, check the list, then use `git rm`, just in case there are any false hits. Alternatively, check `git status` after removing large files, and use `git checkout HEAD <file>` to get back any mistakenly removed files.

Jefromi 2010-06-16 19:03:21

Answer 5

A:

That's because git isn't scalable.

This is a serious limitation in git that is drowned out by git advocacy. Search the git mailing lists and you'll find hundreds of users wondering why just a meager 100 MB of images (say, for a web site or application) brings git to its knees. The problem appears to be that nearly all of git relies on an optimization they refer to as "packing". Unfortunately, packing is inefficient for all but the smallest text files (i.e., source code). Worse, it grows less and less efficient as the history increases.

It's really an embarrassing flaw in git, which is touted as "fast" (despite lack of evidence), and the git developers are well aware of it. Why haven't they fixed it? You'll find responses on the git mailing list from git developers who won't recognize the problem because they Photoshop documents (*.psd) are proprietary format. Yes, it's really that bad.

Here's the upshot:

Use git for tiny, source-code only projects for which you don't feel like setting up a separate repo. Or for small source-code only projects where you want to take advantage of git's copy-the-entire-repo model of decentralized development. Or when you simply want to learn a new tool. All of these are good reasons to use git, and it's always fun to learn new tools.

Don't use git if you have a large code base, binaries, huge history, etc. Just one of our repos is a TB. Git can't handle it. VSS, CVS, and SVN handle it just fine. (SVN bloats up, though.)

Also, give git time to mature. It's still immature, yet it has a lot of momentum. In time, I think the practical nature of Linus will overcome the OSS purists, and git will eventually be usable in the larger field.

John 2010-06-16 18:11:21

This answer is really overly negative and inflammatory. Yes, git has scalability problems *with binary files*. It's quite scalable and fast for code. There's plenty of evidence of the speed (despite your assertion the the contrary), even disregarding the fact that CVS/SVN require network access instead of disk access for many operations. There are many large projects with huge histories quite happily using git.

Jefromi 2010-06-16 19:09:51

And... your harping on the Photoshop thing? I'm not going to waste my time writing a detailed response, but if from reading the entire thread http://thread.gmane.org/gmane.comp.version-control.git/146957/focus=147598 (maybe you're annoyed because the John in the thread is you?), I see a lot of reasonable responses about how best to handle this with current git, how it might be addressed in the future, and why it's not their first priority.

Jefromi 2010-06-16 19:14:47

Yeah, I don't think you're right, here. Git works _way_ too well for the Linux kernel to deserve a dismissive, "isn't scalable."

Andres Jaan Tack 2010-06-16 19:58:45

This comment would be more believable if it had links or data to back it up.BTW, what do you think of mercurial?

vy32 2010-06-16 21:24:13

Answer 6

A:

Here is a censored revision intended to be less negative and inflammatory:

Git has a well-known weakness when it comes to files that are not line-by-line text files. There is currently no solution, and no plans announced by the core git team to address this. There are workarounds if your project is small, say, 100 MB or so. There exist branches of the git project to address this scalability issue, but these branches are not mature at this time. Some other revision control systems do not have this specific issue. You should consider this issue as just one of many factors when deciding whether to select git as your revision control system.

John 2010-06-16 21:19:19

ansaurus

tags:

views:

answers:

git is very very slow

Explanation

Solution

related questions