views:

710

answers:

6

I've found some similar questions (here, here, and here) asking about storing documents into version control. I have a more specific requirement and general question. The specific requirement is that I want to use Git. The more general question is, how should documents (for design, test, general practices, tips, etc, of a project) be stored in Git? More broadly, what documents should be stored?

I can think of a few ways:

  1. Word / Open Office documents. The new Office Word has docx format, which zips up documents, but it also has an unzipped XML format, which could be used to efficiently store diffs in Git. The diff feature is still broken though, since the XMLs are squished on a single line. This is no better than storing a binary file into Git.
  2. Wiki. What distributed wikis exist out there? It'd be like some kind of Latex thing where documents are written and compiled / viewed as a wiki.
  3. Latex - but from using it for papers I find it pretty unsuitable for documents. Is there a documentation equivalent? (How are man pages written?)
  4. Plain text formats, but this is rather lacking due to lack of diagrams, which bring up another point.

How should visuals be stored? What should they be composed in in the first place? I'm developing on a Linux environment, but some other participants in the project are on Windows. What cross-platform solution is there that resembles Visio? And of course, it should not create binary files to be stored into Git. How then would this tie in with documentation? (E.g. Similar to how Latex can reference other diagrams when compiled.)

Thanks

+1  A: 

Git can handle binary files just as well as text files. Instead of explicitly storing diffs, Git stores the entire previous revisions of files in the repository. The repository objects are then compressed to save space. Diffs are reconstructed on the fly whenever you ask for them.

So considering only disk space, there is little difference between storing an XML Office document uncompressed in Git, and storing a zipped version of that same document. The only difference would be the relative performance of Zip vs whatever compression Git chooses to use.

Greg Hewgill
Actually, I think git will do a binary diff (in the creation of pack files) if the savings is sufficiently large...
Pat Notz
Ah, you're quite right, I hadn't considered the structure of pack files
Greg Hewgill
A: 

For Word documents, try using RTF (rich text format), which is basically text. Another possibility would be HTML. They're text, so you should be able to do diffs on them.

Most Wikis are distributed in that they're designed for collaboration. I think you're really asking about whether there are hosted solutions or do you have to manage them. Take a look at http://www.atlassian.com/.

Tommy Hui
A: 

Most document formats don't play terribly well with source control. Almost everything you list is either effectively a binary format or convoluted markup that won't diff well. As long as you just want versions of documents and don't care about the diff, use whatever format you like. I prefer Microsoft Word documents because you can use the built-in change tracking and comment system to track deltas between documents.

As for what documents to store, I would recommend storing anything you'll have a use for later. What documents could be used by someone to continue the project should you leave? What documents would be helpful to bring a new person up to speed? This means specifications, but not documents like burndown charts.

To answer the wiki part of your question, check out DokuWiki. It stores everything in text files so they would be very easy to add into a source control system.

Steve Rowe
A: 

I've just lived with the fact that I can't track changes to binary file formats through a version control system, but I use it anyway since it is useful. Note that typically most of these types of files are work products that will be released (user guides, docs, etc.)

For early project artifacts like requirements and initial designs, I tend to use text documents - not because I can track changes, but because I like to use my IDE for it.

I have never really been "bitten" by the fact that a change can't be "diffed" in version control. The commit comments and other documentation guidelines around changing an important binary document usually make up for that lack of visibility - in that there is another trail if you look for it.

I agree this is not ideal, but I don't think it is really worth fretting over.

Perhaps I just got used to the idea of a set of files I would be able to track as much as I would like.

I put a lot in version control, but also use defect tracking for some things with lifespans that are temporary.

Tim
+2  A: 

My company stores Word documents in SVN, and accesses them via TortoiseSVN.

Tortoise uses Word's built in change tracking function to show you a "diff" of two revisions.

This works really well, but requires Windows and Word.

Edit:

You could probably get this working with git too. If you install TortoiseSVN, then look in %PROGRAMFILES%\TortoiseSVN\Diff-Scripts\, you'll see what tortoise is doing.

If you're using git, I assume you're 1337 enough to hack it to work for you :)

Blorgbeard
The diff-scripts hint is good. Will keep Tortoise in mind
hillu
+2  A: 

When deciding what document format you choose, you should make sure that team members (or are you working alone?) are comfortable working with the format itself.

  1. Storage is not so much the problem as is being able to see diffs between versions and merging. In my experience, nothing beats text formats that can be esited freely in any text editor, this excludes HTML and about any XML-based format, DocBook being a barely usable exception.

  2. A good wiki that can use any of the popular version control systems and be set up in a distributed fashion is IkiWiki. With IkiWiki, markup parsing is done in plug-ins, so you can choose input format on a per-document basis. The "default", Markdown gets pretty close to plain-text formats.

  3. If you're unhappy with using LaTeX, don't use it. I think it's unsuitable for taking quick notes. Man pages are written in nroff, but many people use other formats such as POD.

Some projects that strive to be alternatives to Visio are Kivio (KDE) and Dia (Gtk/Gnome). I haveb't used Visio itself, so I can't comment on their feature sets. It probably depends on what sorts of visuals / diagrams you want to crerate. UML? Flow charts?

hillu