tags:

views:

40

answers:

2

I am trying to come up with a version control process for a web app that I work on. Currently, my major stumbling blocks are two directories that are huge (both over 4GB). Only a few people need to work on things within the huge directories; most people don't even need to see what's in them. Our directory structure looks something like:

/
--file.aspx
--anotherFile.aspx
--/coolThings
----coolThing.aspx
--/bigFolder
----someHugeMovie.mov
----someHugeSound.mp3
--/anotherBigFolder
----...

I'm sure you get the picture.

It's hard to justify a checkout that has to pull down 8GB of data that's likely useless to a developer. I know, it's only once, but even once could be really frustrating for someone (and will make it harder for me to convince everyone to use source control). (Plus, clean checkouts will be painfully slow.) These folders do have to be available in the web application.

What can I do? I've thought about separate repositories for the big folders. That way, you only download if you need it; but then how do I manage checking these out onto our development server? I've also thought about not trying to version control those folders: just update them directly on the web server... but I am not enamored of this idea. Is there some magic way to simply exclude directories from a checkout that I haven't found? (Pretty sure there is not.)

Of course, there's always the option to just give up, bite the bullet, and accept downloading 8 useless GB.

What say you? Have you encountered this problem before? How did you solve it?

A: 

There is nothing saying that the layout in subversion needs to match what you deploy on production. You hit on the easiest and best solution -- move those large files to another repository (another advantage will be that when you branch this structure, you won't be branching the huge files as well). Then you simply need to update your deploy script to pull from two repository locations, rather than one, and put the files in the correct location on production.

If you don't have a deployment script yet, now is the time to write one. Even if it only contains two lines -- the svn commands to pull from the two repositories -- it is still better to have a script that does everything in one command than to have to type this out every time. It's also a good idea to run tests on your content before rebooting your servers, so these tests can also live in the deployment script.

Ether
what is wrong with branching? mysql does not physically copy the files on branchings. ps: oops, did not see the date. why has this thread been bumped?!
zerkms
@zerkms: I assume you mean "svn does not physically copy"... and yes, it doesn't on the server, but the clients will still incur the extra disk cost, which is sometimes a factor (e.g. if I have three different branches checked out on my machine for working on its code, if there are large binaries also in that branch, I have three copies of those files on my machine, rather than just the one if they were split off into a separate branch).
Ether
oh, then yes, indeed. sorry for that comment.
zerkms
A: 

I feel your pain. I once worked in a project where a checkout was 20GB. Working on the trunk and three branches at the same time would fill my HD. I hated it.

You could split that thing into two folders in your repository, one ("main") containing all of the stuff which is interesting for everyone, the other ("big") containing only the 8GB of stuff most people don't care about. The "big" folder references "main" using externals.

/
-/main
--file.aspx
--anotherFile.aspx
--/coolThings
----coolThing.aspx
----...
-/big
--/bigFolder
----someHugeMovie.mov
----someHugeSound.mp3
--/anotherBigFolder
-/everything

Most people would just checkout "main". Some would checkout "big" and, through externals, would get "main", too.

Note that using externals requires attention when tagging, since you want to peg tagged versions' externals to a specific revision.

sbi