views:

411

answers:

7

There has been some discussion on SO (here and here) before on how office documents can be versioned, however I think my question is still a little different.

My programming projects start with a project folder that is empty except for a subfolder named "Design Documents", which contains a draft of the project's functional spec to begin with and is later expanded to contain API specs or whatever else is needed.

Naturally, I check these files into SVN as well. What I'm looking for is good file formats and strategies to play well with the whole versioning, diff'ing and merging processes. For example, I reckon it would be best to store word processor files as XML, or would the diffs be ugly and useless for human readers? Here's what I'd like to do:

  • diff and merge text documents (must-have OOo, nice-to-have MS Word)
  • diff (and maybe merge, though that might turn out to be conceptually difficult) schema drawings, like UML diagrams. I was thinking that these could be independent XML/SVG files and be linked into the text documents, but I don't know enough about how these documents work to tell if that's actually possible.
  • show automatically updated revision numbers inside the text documents (maybe with svn:keywords)

Has anybody done this kind of thing already? There's probably a number of documentation and tutorials on OOo files etc. that I could look up, but while I also appreciate pointers to those, I'm mainly looking for first-hand accounts of things that did or didn't work in practice.


Edit: Just to be sure, there's no "non-technical users" involved here. It's just programmers and the documents are for use on programming projects only. There may be PDFs to be published but that's going to be just another build artefact, nothing that has to be versioned.

Still, we don't really want to use Tex or the like. I know it's great and all, but I just can't be bothered for a simple text document. We'd have to learn it, get all the extra packages right, add Tex-to-PDF rendering to our build process etc. It would be like a little programming project just for the two or three docs. If anything, I'd rather use HTML, but a word processor still seems like a good option to me, except that I want good versioning.

There's another thought: Is there something like an SVN plugin for OOo or the other way around? Or even, what would it take to add SVN support to OOo? Like a "Synchronize" option in the File menu and a "Revision number" text field. I mean, that would not really be part of our business, but it would be cool, and after all, I'm the boss.

A: 

The problem with OOo is the file format is really a zip compressed group of files (try changing the file extension to .zip or .7z). This makes it difficult to do diffs on.

I've been looking at LaTex and DocBook, but establishing a style template to use is quite difficult. This is the hard part all the tutorials gloss over.

So basically I've given up on getting meaningful diffs from documentation. It gets checked in as binary blobs.

+3  A: 

When possible, I keep all "raw" files in text format (i.e. XML/DocBook, plain text, LaTeX) in addition to rendered PDFs under version version control. Also, I try to use the Subversion repository revision number as the version number for the document.

If you can't use the automatic revision number (i.e. in a Word document), I recommend using the date of last modification. Make sure to use the date of the last edit and not the current date macro, because the current date macro will fill in whatever date it is when you print the document, which is probably not what you want. Another way to go is of course to use a traditional incremental version number, but in my experience people often have a hard time knowing which version is the latest when only seeing a number. People are usually more aware of old dates.

Having the date and/or revision number as a part of the header/footer for every page is also a big plus, that way you avoid people mixing up different versions lying around on their desks...

Anders Sandvig
+4  A: 

WYSIWYG editors (Word, OpenOffice) generally don't see a reason why anybody else should mess with their files, so finding an editor which a non-technical user can use and which is friendly to a version control system is impossible. Exception: git has a filter which can look into OpenOffice files. I'm not sure if you can use keyword expansion, though.

I suggest to use a wiki and a week of training for your users how to use it. It solves all your problems (some wikis can even be checked into a version control system). As I said in a different post, the sole obstacle is that users will take a few days to get accustomed to the idea. After that, they will love it.

Aaron Digulla
A: 

We have use a wiki for the documentation on some of our projects. It has great version tracking per entry but not on a document level so we exported the documents and checked that in as well for baselines.

Gerhard
+2  A: 

As Aaorn Digulla noted, a wiki is a great idea. You have to realize that the only way you are really going to achieve what you want is to rethink your entire document strategy.

No software is going to fix the issue of incompatible file formats and not being able to compare different versions. The tools you are using now are the wrong tools for the job. It's going to be tough, uncomfortable, and possibly expensive to change your toolset and your way of thinking, but that's the cost of being able to achieve what you want, which will end up paying for itself 10 fold.

Just think back to the difficulties you had initially with schema docs, UML diagrams, and even something as simple as planning out a Word document. Those worked up until this point and they got you this far, but the only real way forward is to move on to tools that were designed for what you need now.

Note: the ultimate solution might not be a wiki, but it's definitely not what you are using now, either. You need to explore a bit and try a few things out.

Arthur Chaparyan
+2  A: 

A wiki seems like a good idea here, but if that isn't a good fit, you might like to consider using Sphinx (or similar) with ReStructured Text.

Although this, too, would be a "little programming project", it would provide you with HTML as output while keeping the diffs pretty and readable. Setting it up would, I believe, be less of an effort than for LaTeX; the learning curve would be quite a lot smaller; and if you need LaTeX and PDFs in the future, it's there waiting for you.

Brent.Longborough
+2  A: 

The TortoiseSVN client (Windows) for SVN knows enough about Open Office that when you ask it to diff them, once it's got hold of the necessary versions, it opens Office with both documents open, showing the differences. I use this a lot. I imagine there are even fancier solutions out there, and similar things for Mac/Linux/OS/360 or whatever, but for this very technical user who often feels very untechnical because he wants his lunch, TortoiseSVN (http://tortoisesvn.tigris.org/) delivers.

Toby Champion