views:

201

answers:

2

There are formats that are actually zip files in disguise, e.g. docx or odt. If I store them directly in version control, they are handled as binary files. My ideal solution would be

  • have a hook that creates a foo.docx/ directory for each foo.docx files before commit, unzipping all files into it
  • optionally, have a hook that reindents the xml files
  • have a hook that recreates foo.docx from the stored files after update

I don't want the docx files themselves to be version-controlled. (I am aware of a related question where a different approach with a custom diff was suggested.)

Is this doable? Is this doable with mercurial?

UPDATE:

I know about hooks. I am interested in the specifics. Here is a session to demonstrate the expected behavior.

> hg add foo.docx
> hg status
A foo.docx
> hg commit
> # Change foo.docx with external editor
> hg status
M foo.docx
> hg diff
+++ foo.docx/word/document.xml
- <w:t>An idea</w:t>
+ <w:t>A much better idea</w:t>
+2  A: 

You can use a precommit hook to unzip, and a update hook to zip. See the definite guide on how to use hooks.

Be careful about rename. If you rename foo.docx to bar.docx, your precommit hook will need to delete foo.docx/ and add bar.docx/.


UPDATE (sorry for giving an entry-level answer to a 1k-rep user)

If you want to use unpacked docx for core hg operations like diff (status can work with packed file), you'd have to go with an extension. I think you can take a similar approach as the keyword extension as to wrap the repo object with your own.

I have written some extensions but not at that hard core level, so I can't provide more details.

If you want to get crazy you could even do merge with unpacked file. But it's probably safer to treat it as binary and use external tool to diff and merge.

Geoffrey Zheng
I found out that at least Openoffice is very picky about how the files are zipped. A simple unzip->zip cycle can be sufficient to corrupt an .od* file.
Rudi
+1  A: 

If you can get past the hurdle of succesfully unzipping and zipping the Openoffice documents, then you should be able to use the filter system we have in Mercurial. That lets you transform files on every read/write from/to the repository.

You will unfortunately have to do more than just unzip the foo.docx file. The problem is that you need to generate a single file as output -- so perhaps you can unzip foo.docx and then tar up the generated files. You'll then be versioning the tarball, which should work since a tarball is just an uncompressed concatenations of all the individual files with some meta information. Come to think of it, a simpler solution would be to zip the unpacked foo.docx file again but specify no compression. That should give similar results as using tar.

Solving this problem is something I've wanted to do myself, so please report back by sending a mail to [email protected].

Martin Geisler
Zipping with no compression seems to work both for odt, and for docx files, thanks for the tip.
Adam Schmideg