tags:

views:

4649

answers:

5

Does anyone know what are the git limits for number of files and size of files?

+3  A: 

There is no real limit -- everything is named with a 160-bit name. The size of the file must be representable in a 64 bit number so no real limit there either.

There is a practical limit, though. I have a repository that's ~8GB with >880,000 and git gc takes a while. The working tree is rather large so operations that inspect then entire working directory take quite a while. This repo is only used for data storage, though, so it's just a bunch of automated tools that handle it. Pulling changes from the repo is much, much faster than rsyncing the same data.

%find . -type f | wc -l
791887
%time git add .
git add .  6.48s user 13.53s system 55% cpu 36.121 total
%time git status
# On branch master
nothing to commit (working directory clean)
git status  0.00s user 0.01s system 0% cpu 47.169 total
%du -sh .
29G     .
%cd .git
%du -sh .
7.9G    .
Talljoe
+1  A: 

It depends on what your meaning is. There are practical size limits (if you have a lot of big files, it can get boringly slow). If you have a lot of files, scans can also get slow.

There aren't really inherent limits to the model, though. You can certainly use it poorly and be miserable.

Dustin
+17  A: 

This message from Linus himself can help you with some other limits

[...] CVS, ie it really ends up being pretty much oriented to a "one file at a time" model.

Which is nice in that you can have a million files, and then only check out a few of them - you'll never even see the impact of the other 999,995 files.

Git fundamentally never really looks at less than the whole repo. Even if you limit things a bit (ie check out just a portion, or have the history go back just a bit), git ends up still always caring about the whole thing, and carrying the knowledge around.

So git scales really badly if you force it to look at everything as one huge repository. I don't think that part is really fixable, although we can probably improve on it.

And yes, then there's the "big file" issues. I really don't know what to do about huge files. We suck at them, I know.

See more in my other answer: the limit with Git is that each repository must represent a "coherent set of files", the "all system" in itself (you can not tag "part of a repository").
If your system is made of autonomous (but inter-dependent) parts, you must use submodules.

As illustrated by Talljoe's answer, the limit can be a system one (large number of files), but if you do understand the nature of Git (about data coherency represented by its SHA-1 keys), you will realize the true "limit" is a usage one: i.e, you should not try to store everything in a Git repository, unless you are prepared to always get or tag everything back. For some large projects, it would make no sense.

VonC
documentation for git submodules = http://book.git-scm.com/5_submodules.html
Thr4wn
@Thr4wn: see also http://stackoverflow.com/questions/1979167/git-submodule-update/1979194#1979194 for more on the GitPro submodule page. For a shorter version: http://stackoverflow.com/questions/2065559/using-two-git-repos-in-one-folder/2065749#2065749
VonC
+1  A: 

I think that it's good to try to avoid large file commits as being part of the repository (e.g. a database dump might be better off elsewhere), but if one considers the size of the kernel in its repository, you can probably expect to work comfortably with anything smaller in size and less complex than that.

Tchalvak
+4  A: 

If you add files that are too large (GBs in my case, Cygwin, XP, 3 GB RAM), expect this.

fatal: Out of memory, malloc failed

More details here

Brian Carlton