views:

936

answers:

6

The web-application is a custom-built CMS which has several sub-applications and each one of them has code and content residing in the same directory structure. Due to the application framework's architecture the code and content are intertwined (content depends upon the code for its display and other functionalities) and hence are inseparable. The contents are not stored as BLOB rather they are stored as files and the underlying DB is used to link them. Size of sub-applications ranges from 20GB - 250GB and more (this is the killer).

The web-application will go for some enhancements in code (new sub-applications, bug-fixes etc.) and at the same time users will add/update the contents through the already live system. Hence, a deployment/release process is required and most importantly a version control system needs to be suggested for both code and content.

Git comes to the picture because of reasons - it is open-source & free, ease of branching & merging, its not centralized & hence no single-point-of-failure.

BUT after some initial research in the web, I found out some disappointing facts which are applicable to our application - using Git for large systems like ours is painful (checkout, clone, merge, push, pull) and commands are complicated ("geeky" would be more appropriate) for a developer base which is DVCS ignorant and mostly Windows users.

There is no fixed mindset for Git but if I have to go for a centralized approach (in really WORST case) then what should be the way (CVS & SVN apart). I have read about Perforce being a stable one and is also used in Google (I expect some brashes here!!).

Please share, guide and comment your views. I really require them.

+1  A: 

I used git only once for a school project (php site with Zend Framework).

We used git but the teacher needed to have the final release on a svn repo.

Comparing the checkout size:

git checkout was half the size of MB of the svn checkout.

My two cents.

Macarse
Of course, and it always will be because SVN keeps a BASE copy inside your working copy (in the .svn folder). This means diffs, reverts, etc don't need a network. SVN was built to handle low bandwidth comms (think dialup).
Si
git keeps diffs as well - it is distributed version control system so you don't need network to be able to work
stefanB
Stefan is right. SVN will /not/ allow you do arbitrary diffs, only a diff against your most recent update. If you want to be able to work heavily offline, you need a real DCVS, which SVN is not.
Matthew Flaschen
+10  A: 

First, I don't agree that Git is inappropriate for non-technical users. Yes, there are certain features that newbies won't use (e.g. git-send-email). But there are also GUIs like TortoiseGit to make simple things simple.

However, I think you're approaching things the wrong way. Basically, you have content that will change frequently and needs to be editable very easily by Joe Bloggs, and code that will be modified less frequently by coders. The traditional solution is to use a real CMS (e.g. Alfresco, SugarCRM, Drupal, etc. or a Wiki (MediaWiki, MoinMon, etc.), with optional plug-ins. Keep in mind, wikis (and most CMSes) allow versioning of content, in a "user-friendly" way.

Even if you must keep your in-house code, I think you should still want to extricate the content so they can be treated separately. Once you have the code and content separate, your repository will be a more reasonable size. Then, you can use whatever VCS you want (though I'm not really sure you're right that Git is inherently bad for large repos).

Matthew Flaschen
Matthew, have you used TortoiseGit yourself? I haven't, but the impression I've got is that it is still very beta (if not alpha). And I have tried using MSYS Git on Windows and find it clunky and idiosyncratic. And without a useable GUI interface like TortoiseGit, is really is not suitable for non-techies or the faint of heart.
Evan
Evan, I haven't had opportunity to use it yet either. However, it's based on the popular TortoiseSVN, and it's actively maintained. Thus, I definitely think it is usable.
Matthew Flaschen
I experimented with TortoiseGit very briefly at my workplace as we were evaluating alternative source control systems. My non-technical test users were utterly confounded and confused by it, and within a few hours were actively hostile.
Crashworks
+10  A: 

I just happened to be reading this blog post not one minute ago. It's a bit of a rant about the scalability of git.

pgs
Nice post :)Hmm are those issues solveable or maybe it's git's design that's in fault?
the_drow
Solvable? I don't know. Linus designed git to handle the linux source code tree, a job it does very well. But that's pretty much all text files. The repository, checkout and built objects all total less than 2GB on my computer.
pgs
+1  A: 

Is SVN really such a bad option?

PROS:

  • Can handle large repositories e.g. many linux distro's use it, also Apache, Sourceforge
  • Has nice GUI front end with TortoiseSVN to keep your windows users happy
  • Can be used with windows integrated authentication to keep admins happy
  • Many different backup strategies can be adopted based on your requirements (svnadmin hotcopy or dump, svnsync, post-commit hooks) to help ease your single point of failure concern.

CONS:

  • Centralised VCS

Disclaimer: I've never used Perforce and have been a happy SVN admin and user for ~6 years (since v0.29)

Si
I think the file sizes we're talking about are going to cause issues with any system - 250GB of files in a single checkout, regardless of the VCS overhead, is going to be flat out painful over a network.
Sean McSomething
I agree Sean, but if he wants a VCS solution why choose a system designed for source code rather than any type of file?
Si
+4  A: 

git does not scale for large repositories. It's not the space, it's the number of files. Please read my blog article that I wrote a while back about this.

In my experience, if you want a scalable, fast, centralized source control system, P4 is the way to go.

Jared Oberhaus
+1  A: 

There's a utility script called git-split that chops up a git repo to make it more efficient.

Mike Caron