tags:

views:

82

answers:

3

Is there anyone using git in such a fashion?

I would like to distribute some multimedia content from a server to some Android remote devices. I would like them sending back a log file with device usage statistics (provided by an android app I will write).

The server could be anything but I would prefer a linux box.

I thought that since git handle and sych only differences between files, It would be a nice tool for this purpose and I would have content revision history as a bonus.

I need some piece of advice on how the repositories architecture could be organized: does It have to be a star topology or something different?

The remote end of the sistem don't need any interactivity, in other words the remote git repository could pull and push whatever It needs to, autonomously and automatically.

UPDATE: I've found here on SO the author of git internals (I'm downloading It right now), Scott Chacon talking about the architecture I would like to implement.

UPDATE 2: OK I read the chapter about "Non-SCM uses of Git" and here is what the author says about a Peer to Peer CDN:

You have to get new content [...] consist of any combination of xml files, images, animations, text and sound. You need to build a content distribution framework that will easily and efficiently transfer all the necessary content to the machines on your network. You need to constantly determine what content each machine has and what it needs to have and transfer the difference as efficiently as possible.[...] It turns out that Git is an excellent solution to this problem.

I don't find anything about mentioning little portions of the book inside it, so I hope that I'm not violating any copyright. In any case I will delete It if someone complain.

+2  A: 

I would suggest against using git for such purpases. For starters, Git will use extra phone storage for the revision history, and it will send entire files (not deltas) anyway because multimedia content is binary and diffing does not work on it. Just implement a method to list server-side multimedia with last-modification dates and another method to download updated files (I would suggest HTTP as it is the simplest). On the server side, you can of course use git internally for versioning the multimedia files, but I'd rather not expose the git interface.

Gintautas Miliauskas
I see, thanks a lot for pointing about the media not diffing. Do you think that Scott Chacon http://github.com/schacon/git-media could solve this problem?
microspino
git uses libxdiff for deltaification, and it does support binary diffs. Note however that most binary files do not delta well.
Jakub Narębski
I stand corrected, git does do binary diffs. Thanks, Jakub.
Gintautas Miliauskas
+1  A: 

The git protocol tries to send patches instead of whole files, but the git storage engine always stores whole files, and always keeps old versions of the files. git is probably not the tool for the job if you aren't trying to keep file history.

rsync is a mature file distribution system that can work over ssh or its own protocol (the same as git), can make binary patches, and doesn't necessarily keep change history. Probably start looking there to see if you can get that work.

masonk
+1  A: 

So in a previous job, we used Git for exactly this and the reason was that our media assets were not often changing, so no matter what we used it was likely we would have to send the whole file anyways - thus, the issues with binary deltifying, though also an issue with other content distribution tools, was not important.

The main advantage to rsync (and presumably unison, though I've never used it) is that you can build the content trees in the index and store the trees in Git under a branch per client rather than having to have everything on disk to run rsync on. If you have several variations on content, it's pretty cool to be able to record unique trees of content needed by each client - of which you could have thousands of combinations - and have a simple pull command fetch only what's needed and update it on the client. That was the reason we choose Git instead of rsync to do that. If every client needs exactly the same set of data, perhaps rsync would be easier, however the other nice thing about Git is that you get a history of the content on each client - when and how it changed for every single client.

We also used it to record log files - since they are generally pretty uniform and text based, they delta excellently and transfer very efficiently - we were very happy with using that to record and transfer back upstream our log data.

Scott Chacon