views:

146

answers:

4

Is there something I can do or consider when working with Word files in source control/Subversion in order to minimize the size of the changes in the repository?

Background

For a project we have a Word document with our functional specifications with screen shots from a prototype in source control (Subversion). The Word file itself is about 2.5 MB.

Recently I changed the titles of around ten sections and updated the table of contents. Then I checked this into Subversion (svn) - only the described changes.

After check-in, I checked the size of the diff file in the svn-repository, and was surprised to see it was slightly larger than 1 MB. I had hoped it would be 'small', say smaller than 100 kB.


Edit: Currently the file is in Word 2003 format (doc), but I use Word 2007, so I could change to Word 2007 format (docx) if that would decrease the size of the repository deltas?

A: 

Try disabling Fast Saves in the first place.

Anton Gogolev
+2  A: 

It's one of the reasons to write documentation in some kind of coded format (HTML, Tex, wiki-syntax) and have it converted to other formats (Word, HTML for web, Windows-help-files, man pages, PDF)

Stijn Sanders
Nice observation, however, it does not answer my question...
Ole Lynge
A: 

As someone already pointed, if binaries are stored using some sort of XDelta, it won't guarantee "patches" will be smaller than the file itself... Sometimes the patches will be almost as big as the file itself.

Try changing a RGB value on a Photoshop picture and run XDelta... the patch will be almost as big as the file itself.

But, IMHO, you shouldn't worry about that. Most modern SCMs out there (GIT, Plastic SCM, ...) will zip your files so storage won't be a huge concern... Although I guess we will never buy the sentence "don't worry about disk space since it's cheap now" :-P

pablo
Yeah. It's also about speed... and what if I change that file, say, twice a week during a year. That will be 100 MB for that single file alone...
Ole Lynge
Well, is 100MB the compressed size?I mean, if you use Word then deltifying will work, but I'm not so sure it will with other formats
pablo
+1  A: 

See also http://stackoverflow.com/questions/90202/can-i-merge-two-microsoft-word-documents-reliably-with-subversion/688327#688327

You can save docx documents to a "Flat OPC" XML format using Word (Save As .. XML document), but you might need to pretty print the XML first, since it is all on one line.

plutext
Internally Subversion uses a binary diff; the pretty printing is only necessary to get a clean textual diff. (Which is obviously nice to have, but not necessary to keep your repository small)
Bert Huijben