Git and binary data

views:

159

answers:

+2 Q:

Git and binary data

I'm currently starting to use git for my version control system, however I do a fair bit of web/game development which of course requires images(binary data) to be stored. So if my understanding is correct if I commit an image and it changes 100 times, if I fetch a fresh copy of that repo I'd basically be checking out all 100 revisions of that binary file?

Is this not an issue with large repo's where images change regularly wouldn't the initial fetch of the repo end up becoming quite large? Has anybody experienced any issue's with this in the real world? I've seen a few alternatives for instance, using submodules and keeping images in a separate repo but this only keeps the codebase smaller, the image repo would still be huge. Basically I'm just wondering if there's a nice solution to this.

+2 A:

I wouldn't call that "checkout", but yes, the first time you fetch repository, provided that binary data is huge and incompressible it's going to be what it is - huge. And yes, since conservation law is still in effect breaking it into modules won't save you space and time on initial pulling of repository.

One possible solution is still using separate repository and --depth option when pulling it. Shallow repositories have some limitations, but I don't remember what exactly, since I never used it. Check the docs. Keyword is "shallow".

Edit: From git-clone(1):

A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it), but is adequate if you are only interested in the recent history of a large project with a long history, and would want to send in fixes as patches.

Michael Krelin - hacker 2009-12-15 22:57:02

Interesting if you take into mind the above doc quote it almost seems that a non distributed vcs might be better for binary data, as you are missing alot of the advanatages of using git when dealing with binary data anyway.

Jamie 2009-12-15 23:17:16

Yes, but you may still take the pain of fetching huge repository once. Also, you can use separate non-git repository for binary data. But since I really love git (though was sceptical at first about it - everything Linus writes will be praised), I'd suggest separating binary data and... well, dealing with it separately ;-)

Michael Krelin - hacker 2009-12-15 23:21:57

+1 A:

Unfortunately git is not really made for storing binary data. Because it is distributed you would be pulling all versions of all files whenever you clone it. It also becomes ridiculously difficult to prune those large binary files out of your code repository. More about that here: (http://www.somethingorothersoft.com/2009/09/08/the-definitive-step-by-step-guide-on-how-to-delete-a-directory-permanently-from-git-on-widnows-for-dumbasses-like-myself/).

I would recommend trialing it but keep binary files separately from the code (i.e. using submodules). In that case if it doesn't work out for you, you can use another solution without rewriting the whole history for your main repository.

Igor Zevaka 2009-12-15 23:11:41

ansaurus

tags:

views:

answers:

Git and binary data

related questions