views:

218

answers:

7

I have a simple blog application written in Python, using Django. I use Git to version-control this website. The main content of the site is a blog. The blog entries are stored in a SQLite database (which is not version-controlled, but is backed up regularly); some entries contain images and other media (like PDFs).

I currently store this "blog media" in the repository, alongside other media (such as external JavaScript code, and images used for layout purposes -- all nicely organized, of course). It occurred to me, however, that this isn't really a good strategy, for a few reasons:

  1. Whenever I post a new blog entry that contains an image or a link to a PDF, I have to add the image to the repo and then copy a new version to the production server -- which seems like a lot of work just to add an image. It'd be easier just to upload the image to the server (and make a local backup, of course).
  2. Since this media is content rather than code, it doesn't seem necessary to store it alongside the code (and related style media) itself.
  3. The repo contains a lot of binary files, which increase the overall size of the repo; and more importantly,
  4. I never really edit these images, so why keep them under version-control?

So I'm considering removing these files from the repo, and just copying them to a directory on the server outside of the directory containing the Python code, templates, style sheets, etc., for the website.

However, I wondered: Is there a "best practice" for dealing with content images and other media in a website's repo, as opposed to images, etc., that are actually used as part of the site's layout and functionality?


Edit

To elaborate, I see a difference between keeping the code for the website in the repo, and also keeping the content of the site in the repo -- I feel that perhaps the content should be stored separately from the code that actually provides the functionality of the site (especially since the content may change more frequently, and I don't see a need to create new commits for "stuff" that isn't necessary for the functioning of the site itself).

A: 

Move version control systems don't work well with binary files, that being said, if they're not changing, it makes no (little) difference.

You just have to decide which is easier, backing it up on the repository and the multistep process to add an image/pdf/whatever, or maintaining a separate set of actions for them (including backup). Personally I'd keep them on the version-control. If you're not changing them it's not harming anything. Why worry about something that isn't causing harm?

Malfist
+1 Agreed. They've gotta be stored 'somewhere'. they're not taking up any more space in the repo than on any particular drive (mostly). If you're using subversion, you can also mark them as binary resources - and it will treat them as such.
SnOrfus
+2  A: 

Initially, I would say don't put them in the repo because they'll never change but then consider the situation of moving your website to a different server, or hosting provider. You'd need an easy way to deploy it, and unless it's not under version control, that's a lot of copy/paste that could go wrong. At least it's all in once place if/when something happens.

This isn'y really an answer as much as it's something to consider.

SnOrfus
+1, another point to consider besides what I raised in my answer
Malfist
+3  A: 

Keep them in version control. If they never change, you don't pay a penalty for it. If they do change, well, then it turns out you needed the version control after all.

flodin
+1  A: 

Version them. Why not? I version the PSD's and everything. But if that makes you wince, I can understand. You should version the javascript and stylesheets though, that stuff is code (of sorts).

Now, if by content, you mean "the image I uploaded for a blog post" or "a pdf file I'm using in a comment", then I'd say no--dont version it. That kind of content is accounted for in the database or somewhere else. But the logo image, the sprites, and the stuff that makes up the look and feel of the site should absolutely be versioned.

I'll give you one more touchy-feely reason if you aren't convinced. Some day you'll wish you could go into your history and see what your site looked like 5 years ago. If you versioned your look & feel stuff, you'll be able to do it.

Cory R. King
Your response gets to the root of my question. I do plan on keeping layout-related media (including images) in the repo, but "images I upload for a blog post" don't seem appropriate to store in the repo, since they're not integral to the functioning of the website.
mipadi
A: 

I think you need to ask yourself why you are using version control and why are you making back-ups Probably because you want to safeguard yourself against loss or damage of your files and in the event of something terrible happens you can fall back on your backups.

If you use version control and a separate backup system you get into the problem of distribution because the latest version of your site lives in different places. What if something does go wrong, then how much effort is it going to take you to restore things? To me, having a distributed system with version control and backup's seems like a lot of manual work that's not easy script-able. Even more, when something does go wrong you're probably already stressed out anyway. Making the restoration process harder will probably not help you much.

The way I see it, putting your static files in version control doesn't do any harm. You have to put them some where anyway be in a version control repository or a normal file system. since your static files never change they're not taking up more space over time, so what's the problem? I recommend you just place all of it under version control and make it easy on yourself. Personally I would make a backup of my database with regular intervals and commit this backup to version control as well. This way you have everything in one place and in the case of disaster you can easily do a new checkout/export to restore your site.

I've build this website. It has over a gig of PDF files and everything is stored under version control. If the server dies, all I have to do is a clean export and re-import the database and the site it up and running again.

Luke
A: 

If you are working on a web project, I would recommend creating a virtual directory for your media. For example, we setup a virtual directory in our local working copy IIS for /images/ /assets/ etc. which points to the development/staging server that the customer has access to.

This increases the speed of the source control (especially using something clunky like Visual Source Safe), and if the customer changes something during testing, this is automatically reflected in our local working copy.

Kyle B.
+1  A: 

You are completely correct on two points.

  1. You are using Version Control for your code.
  2. You are backing up your live content database.

You have come to the correct conclusion that the "content images" are just that and have no business in your code's Version Control.

Backup your content images along with your database. You do not want to blur the lines between the two unless you want your "code" to be just your own blog site.

What if you wanted to start a completely different blog. Or your friends all wanted one.You wouldn't be giving them a copy of your database with all your content. Nor would it be any use for them to have a copy with all your content images.

Mufaka
I think you are mistaken with the difference between 'source control' and 'version control'. Anything you can write to a disk can be brought under 'version control'.
Luke
Not really mistaken. It's possible that I didn't convey my point as well as I should. The point is that dynamic content shouldn't be included in the code base repository. It's useless to have it there when the additional content that is using it is not (the text of the blog in the database).
Mufaka
Dynamic content? We're talking about static content here mostly like images, pdf's or anything else that be considered a resource that doesn't change frequently. The thing is, even though it might be an image it can still change and you might want to control that change in a vc system.
Luke
Ok, it's static content that is dynamically added to his blogging application. That's dynamic content in my book.He's versioning the code, not the text or resources for individual blog entries. Sure, he can version control his images, but that has no business in his code repository.
Mufaka
Lot of great answers to this question! Stack Overflow insists that I mark an answer as accepted, though :), and I think this answer gets to the heart of my question the best.
mipadi