Hi - I'm trying to dig up resources on how version control algorithms operate on data, and I'm especially interested in the way git's mechanism operates. I realize git does many different things, but in particular I'm interested in how history is saved and restored. I'd appreciate any links or article references anyone can point me to. thanks :)
views:
126answers:
5If you're interested in mercurial, the mercurial book is a great ressource. The original paper from Matt Mackall at OLS is good too.
If you know how to use git and what it does, but you're curious how, then dig into gitcore-tutorial for start, it shows what objects are stored inside git repository, how it stores next revisions, what is revision and how to do it manually, how revisions are connected and so on.
This presentation is also helpfull in terms of showing how it all works. It was created by maintainer of git-scm page and one of github stuff, so he knows what he talks about.
The Pro Git book has a chapter on internals that might be helpful.
http://progit.org/book/ch9-0.html
It doesn't actually go into details on packfile structure, but it pretty comprehensively covers everything else. If you want to know about packfile and pack index structures, I covered it here in some detail.
The only thing that page doesn't cover is the actual delta algorithms, but afaik that isn't actually covered anywhere. If you're curious I can explain it, though.
History (of a project) in Git is quite simple. Git is at conceptual level snapshot based, which means that history of a project in simplest case of linear history is a string of subsequent versions of a project.
Single version of a project is represented by commit object, which contains information about state (snapshot) of the whole project at given version (revision), version metadata like date of creating a comit and author info, and pointer to zero or more previous versions given one is based on. The versions that given commit is based on are called parent commits. So for linear history it would be list of commits (representing versions / revisions), each of them but last (called sometimes a root commit) pointing to previous / parent commit. There is also branch tip pointer which references latest commit (latest version in given branch), and HEAD which says which branch is current branch.
In more complicated situations history is DAG (Directed Acyclic Graph) of versions, where each version is represented by a commit object with zero or more parents pointing to other commit objects (other versions).
Besides already recommended articles I'd like to point to two more:
- The Git Parable blog post by Tom Preston-Werner describing how Git could have ben developed, and describing quite well the design of Git.
- Git from the bottom up by John Wiegley.