views:

182

answers:

10

There is what seems to be a plethora of version control systems. Therefore, to draw a bad conclusion, it must be easy to write one.

What are some issues that must be considered in order to write a simple file versioning system? (What are the minimum necessary functions?)

Is it a feasible task for one person?

A: 

A good delta algorithm, good compression and network efficiency.

Alan Haggai Alavi
+2  A: 

If you're Linus Torvalds, you can write something like Git in a month.

But "a version control system" is such a vague and stretchable concept, that your question is really unanswerable.

I'd consider asking yourself what you want to achieve (learn about VCS, learn a language, ...) and then define some clear goal. It's good to have a project, but it's also good to have a reachable goal in a small amount of time. Small successes are good for your morale.

Kurt Schelfthout
I would say "if you're Linus Torvalds you can attract other programmers so they can write that program you want in a month"
fortran
Note that Linus Torvalds a.) was using BitKeeper extensively so he know what he wanted from distributed version control system b.) took ideas from other SCMs, like SHA-1 content adressed from Monotone (which was too slow for Linux kernel, at least at that time). Also the month is what took to create bare bones git-core plumbing, a basis for an SCM, not something mere mortal can use. 4 years since first released version, and Git is continously being improved.
Jakub Narębski
@fortran: not true. AFAIK he alone wrote a basically usable Git core in a month. I assume that's what the question is asking - a basically usable VCS.@Jakub: Torvalds did not invent DVCS, but the question doesn't imply inventing a new kind of VCS either. So, I stand by my original statement. :)
Kurt Schelfthout
A: 

A simple one is doable by one person for a learning opportunity. One issue you might consider is how to efficiently store plain text deltas. A very popular delta format is the one from RCS (used by many version control programs). You might want to study it to get ideas.

Jeff Moser
+6  A: 

A good place to learn about version control is Eric Sink's Weblog. His most recent article is Time and Space Tradeoffs in Version Control Storage, for one example.

Another good example is his series of articles Source Control HOWTO. Yes, it's all about how to use source control, but it has a lot of information about the decisions and tradeoffs developers have to make when designing the system. The best example of this is probably his article on Repositories, where he explains different methods of storing versions. I really learned a lot from this series.

Bill the Lizard
But see also comments on some of Eric Sink's articles in this thread on git mailing list: http://thread.gmane.org/gmane.comp.version-control.git/117659
Jakub Narębski
Most of Sink's articles are not about distributed source control, which is what that thread is discussing.
Bill the Lizard
+3  A: 

That IS really a bad conclusion. My personal opinion here is that the problem domain is so wide and generally hard that nobody has gotten it "right" yet, thus people try to solve it over and over again, from different angles and under different assumptions.

That of course doesn't mean you shouldn't try. Just be warned that many smart people were there before you, so you should do your homework.

Nikolai N Fetissov
+1. There are a lot of compilers too, and a lot of operating systems.
ChrisW
A: 

To write a proof of concept, you probably could pull it off, implementing or borrowing the tools Alan mentions.

IMHO, the most important aspect of a VCS is ease-of-use. This sounds like an odd statement, but when you think about it, hard drive space is one of the easiest IT commodities to scale horizontally, so bad compression or even real sloppy deltas are going to be tolerated. The main reason people demand improvement in versioning systems is to do common tasks more intuitively or to support more features that droves of people eventually demand but that weren't obvious before release. And since versioning tools tend to be monolithic and thoroughly integrated at a company, the cost to switch is high, and it may not be possible to support a new feature without breaking an existing repo.

David Berger
A: 

The very minimal necessary prerequisite is an exhaustive and accurate test suite. Nobody (including you) will want to use your new system unless you can demonstrate that it works, reliably and completely error free.

anon
Self-hosting a VCS is an important landmark in its development. It's when the developers have enough confidence in their work to commit to using it - the ultimate in "dog fooding".
Jonathan Leffler
I'm not sure what your comment has to do with my answer.
anon
+1  A: 

Have a look in the question "core concepts" about (D)VCS.
In short, writing a VCS would involve making a decisions about each of these core concepts (Central vs. Distributed, linear vs. DAG, file centric vs. repository centric, ...)

Not a "quick" project, I believe ;)

VonC
+1  A: 

How simple?

You could arguably write a version control system with a single-line shell script, upversion.sh:

cp $WORKING_COPY $REPO/$(date +"%s")

For large binary assets, that is basically all you need! It could be improved quite easily, say by making the version folders read-only, perhaps recording metadata with each version (you could have a text file at $REPO/$(date...).meta for example)

That sounds like a huge simplification, but it's not far of the asset-management-systems many film post-production facilities use (for example)

You really need to know what you wish to version, and why..

With large-binary assets (video, say), you need to focus on tools to visually compare versions. You also probably need to deal with dependancies ("I need image123.jpg and video321.avi to generate this image")

With code, you need to focus on things like making diff's between any two versions really easy. Also since edits to source-code are usually small (a few characters from a project with many thousands of lines), it would be horribly inefficient to copy the entire project for each version - so you only store the differences between each version (delta encoding).

To version a database, you probably want to store information on the schema, tracking new tables, or columns, or adjustments to existing ones (rather than calculating deltas of the database files, or making copies like the previous two systems)

There's no perfect way to version everything, you have to focus on doing one thing well.. Git is great for text, but not for binary files. Adobe Version Cue is great with binary files (images), but useless for text..

I suppose the things to consider can be summarised as..

  • What do you want to version?
  • Why can I not use (or extend/modify) an existing system?
  • How will I track differences between versions? (entire files? deltas?)
  • What other data do I need to attach to versions? (Author? Time-stamp? Dependancies?)
  • What tasks would a user commonly need to do (diff'ing? reverting specific files?)
dbr
+1  A: 

What could give you a good overview in a less technical manner is The Git Parable. It is a nice abstraction on the principles of git, but it gives a very good understanding what a VCS should be able to perform. All things beyond this are rather "low-level" decisions.

pmr