views:

212

answers:

3

Currently using Gettext on a project and the .po files are nicely kept under version control.

PO files of course contain translations, but in addition to that they also contain some metadata - information about the exact files and line numbers where the translatable strings are located.

The problem is that each time you update the PO files the metadata changes a whole lot more than the actual translations. This makes it really hard to later see from version control diff what actually was changed - you just see a myriad of changes to file names and line numbers. Like that:

- #: somefile.js:43
- #: somefile.js:45
- #: somefile.js:118
+ #: somefile.js:203
+ #: somefile.js:215
  msgid "Translate me please"
  msgstr "Tõlgi mind palun"

- #: somefile.js:23
- #: somefile.js:135
+ #: otherfile.js:23
+ #: otherfile.js:135
  msgid "Note"
  msgstr "Märkus"

- #: andThatFile.js:18
  #: orThisFile.js:131
- msgid "Before I was like this"
- msgstr "Selline olin ma enne"
+ msgid "I happen to be changed"
+ msgstr "Paistab, et mind muudeti"

Of course, a simple fix would be to just disable the generation of filename/linenumber comments in xgettext output. But I actually find those file names to be quite useful hints when translating.

I surely cannot be the only one who doesn't like the diffs of his PO files. Suggestions?

+5  A: 

A simple fix would be to apply a grep filter to remove comment metadata from the viewed diff. You can either do this to the output of the version control diff utility:

myVersionControl diff REV1 REV2 filea | grep -v '^..#'

or you may be able to instruct the version control diff utility to ignore these before it makes the comparison, which will likely result in a more reliable and prettier output:

I don't know what version control system you use, but git (for example) allows you to preprocess the input to diff and remove the comment lines for certain file types (thanks VonC), see man gitattributes and search for Performing text diffs of binary files. Here's the body of a sample script to save as /usr/local/bin/strippocomments which will do that:

grep -v '^#:' $1

You can then tell git to use this script to preprocess po files, by adding the following to the file .git/info/attributes in your repository:

*.po diff=podiff

and to the file .git/config in your repository:

[diff "podiff"]
    textconv = /usr/local/bin/strippocomments

Using git diff should then not include any lines starting with #:.

Note that the diffs generated from git diff using this approach should not be used for patching - but git format-patch will still use the default diff, so patches generated for emailing will still be ok.

Alex Brown
It seems that my git doesn't support --ignore-matching-lines option. Using version 1.6.5.2
Rene Saarsoo
But filtering the diff with grep really does produce a lot cleaner output. How came I didn't think about this by myself? Of course this doesn't help when I view the diff through a something else than console interface, but it indeed solves most of the problem.
Rene Saarsoo
okay, I took a stab that it would work based upon the fact that lots of stuff I hoped would work in the past actually did! However, git diff is very flexible and can probably be made to do it.
Alex Brown
I want to note that regexp should be '^#:' not just '^#' because there is other forms as '#,fuzzy' which you need to see actually.
bialix
Thanks for the tip of using `textconv` - that's a lot better than using `command = mypodiff`.
Rene Saarsoo
@Alex: good use of `textconv`. +1
VonC
Alright, I think your answers were the most helpful ones. Although great thanks for all the others who botherered to answer too. Here's your 150 points :)
Rene Saarsoo
+2  A: 

GNU gettext package has numerous useful utilities to perform various tasks with PO files. There is msgcmp to compare two PO files, msgcomm to select common/unique messages, msgattrib to select/filter/transform existing PO files. Depends on what you actually need from diff of PO file, I think you need to use either msgattrib or msgcomm.

If you need to just compare two PO files without comments about file/line then simple script to grep and save in temp dir your old and new PO files would be sufficient.

bialix
+3  A: 

You could look at the different options offered by a custom diff a .gitattribute file, like specifying a special diff for po files

[diff "mypodiff"]
    command = mypodiff
*.po   diff=mypodiff

with mypodiff a script calling any diff tool able to filter out the line that you wa wnt

VonC
Thanks, managed to get this working. A bit tricky part was getting the argument order correct for the external diff command.
Rene Saarsoo
BTW, is there a way to have the option `--ext-diff` always on when running commands `git-show` and `git-log` (and possibly others). The external diff command is applied to .po files when using `git-diff`, but I rarely use that command, for others I need to add `--ext-diff` option.
Rene Saarsoo
I would say: `git alias` could help you there. by defining aliases, you could add the relevant option to those commands.
VonC
Probably aliases is the way to go...
Rene Saarsoo
Thanks for your hint, helped me fix my submission.
Alex Brown