tags:

views:

78

answers:

3

I'm considering using GIT or our family photos. The scenario is that both me and my girlfriend uses digital cameras, upload the photos to each our own computer. But still want to organize the photos into folders for different events, and have that organization replicated between our two computers.

By having the GIT repository master on our central server we can also view the images from our TV or access them through FTP if we need them while away from howe.

This implies a structure where the images wont change very often. Rarely they'll get moved around. The most common action will be to add folder with new images to the repository and commit it to the master branch so the images are avaliable to everyone.

Now the question. How will the images be handled in GIT? Will the git repository be bloated up by maintaining an image for each version of the repository? Or will it only keep history of an image when it actually changes its content?

The difference in diskspace usages should be quite large for the two given scenarios.

+1  A: 

It is smart enough to only track changes and not create a full replica of the image repository

Zaid Zawaideh
Well, not really. Git doesn't track changes, it tracks states. It generates diffs on the fly, rather than storing a state as an initial state and a series of diffs like subversion. In the case of editing a file, git will store both files in full regardless of how little was changed by the edit.
meagar
Well, not really. When Git puts objects in packs, it does use deltas, which can result in significant savings of disk usage. So just make sure to repack your repo frequently and you'll be okay.
siride
@siride Not for images that aren't changing :p
meagar
@meagar: The deltaification algorithm has some window it uses to check if given pairs of blobs (file versions) are good for delta compression... so in theory if you have similar (in its on-disk representation) images, they could be stored as deltas in git packfile.
Jakub Narębski
@Jakub Interesting, hadn't realized git would be so clever about packing files.
meagar
+2  A: 

I think you should consider using rsync. From what I understood you just want to synchronize your network folders, right? Is there a real need for versioning?

pma
I would actually recommend unison http://www.cis.upenn.edu/~bcpierce/unison (and specifically), unison-gtk, over rsync, for this scenario.
gotgenes
While this is true, it doesn't answer the question of how Git handles binary images behind the scenes.As for rsync you are assuming only linux/unix machines. Atleast one machine uses Windows, and support for git is better on windows.But no, versioning isn't required.
Morten
unison seems interesting for the actual image scenario
Morten
Unison also works on windows machines http://alan.petitepomme.net/unison/index.html. Although it might be a little harder to use (due to lack of GUI).
pma
+3  A: 

Every object in your repository will be stored once and referenced by it's SHA1 sum. This means that placing three copies of a 100KB image in two directories of your repo will use 100KB, plus some inconsequential amount of overhead.

The same applies to pushing, pulling and branching: So long as the SHA1 sum of the image doesn't change, git will never store a second copy or have to move more than one copy across the network.

You'll be using twice the disk space on each machine though: Git maintains a copy of all it's data in a hidden .git directory in the root of your repo.

meagar
Great answer! It describes exactly what I was interested in knowing in Git-internals.
Morten