views:

33

answers:

1

Our repositories are getting huge because there's tons of media we have ( hundreds of 1 MB jpegs, hundreds of PDFs, etc ).

Our developers who check out these repositories have to wait an abnormally long time because of this for certain repos.

Has anyone else had this dilemma before? Am I going about it the right way by separating code from media? Here are some issues/worries I had:

  • If I migrate these into a media server then I'm afraid it might be a pain for the developer to use. Instead of making updates to one server he/she will have to now update two servers if they are doing both programming logic and media updates.
  • If I migrate these into a media server, I'll still have to revision control the media, no? So the developer would have to commit code updates and commit media updates.
  • How would the developer test locally? I could make my site use absolute urls, eg src="http://media.domain.com/site/blah/image.gif", but this wouldn't work locally. I assume I'd have to change my site templating to decide whether it's local/development or production and based on that, change the BASE_URL.
  • Is it worth all the trouble to do this? We deal with about 100-150 sites, not a dozen or so major sites and so we have around 100-150 repositories. We won't have the time or resources to change existing sites, and we can only implement this on brand new sites.
  • I would still have to keep scripts that generate media ( pdf generators ) and the generated media on the code repository, right? It would be a huge pain to update all those pdf generators to POST files to external media servers, and an extra pain taking caching into account.

I'd appreciate any insight into the questions I have regarding managing media and code.

A: 

First, yes, separating media and generated content (like the generated pdf) from the source control is a good idea.
That is because of:

  • disk space and checkout time (as you describe in your question)
  • the lack of CVS feature actually used by this kind of file (no diff, no merge, only label and branches)

That said, any transition of this kind is costly to put in place.
You need to separate the release management process (generate the right files at the right places) from the development process (getting from one or two referential the right material to develop/update your projects)

Binaries fall generally into two categories:

  • non-generated binaries:
    They are best kept in an artifact repository (like Nexus for instance), under a label that would match the label used for the text sources in a VCS
  • generated binaries (like your pdf):
    ideally, they shouldn't be kept in any repository, but only generated during the release management phase in order to be deployed.
VonC