views:

143

answers:

2

Looking for tips and/or tools on how to efficiently work with gettext PO files when making small edits to large msgid values.

Example: We have lots of multi-sentence/multi-paragraph messages that are stored in our PO message catalog files. If we make a very minor change to a message, perhaps editing a single sentence or even correcting punctuation, we lose our original translation when we run the msgmerge utility.

Rather than re-translate long messages (that have already gone through an editorial approval process) from scratch, our translators return to backup copies of their PO files and manually search for the text of the last msgid/msgstr translation pair which they then diff against the current msgid values to see what has changed, followed by a copy and paste of the last translation which they then edit to reflect the updated msgid value.

That's a lot of work! Certainly there must be a better way of handling this type of workflow?

Is there a best practice way to archive and find previous translations that are no longer in a PO file? One idea that comes to mind is to store a unique msg id in the text of our messages or in the comments that precede our message and use this id to retrieve previous msgid/msgstr translation pairs for review. Or are there PO editors or online services that make this process more efficient?

Thank you, Malcolm

+1  A: 

Virtaal's translation memory support can probably help with this. If your original units are in the translation memory, it will be shown (with differences) within a certain margin of change (based on Levenshtein distance). It will still contain the original (unmodified) translation, but at least the original text is more easily accessible and the differences highlighted.

I'm not 100% sure, but Pootle might also offer a web based solution. If you need any help, ask in #pootle on FreeNode.

The more general improvement is, of course, to separate/segment the units as far as possible.

Walter
Thank you Walter. Regards, Malcolm
Malcolm
+1  A: 

I've been looking for a way to make minor changes to msgids without disturbing existing translations - for instance, typo fixes in the source text. Here's a recipe I've just worked out that doesn't involve websites:

  1. Use msgen from GNU gettext to generate an English-to-English po file:

    msgen project.pot >corrections.po

  2. Manually edit the msgstrs in "corrections.po" to reflect the typo fixes made in the source text, so we have a mapping from uncorrected to corrected strings. (I haven't thought about how to automate this bit.)

  3. For each "real" translation (for example ca.po): abuse poswap from the Translate Toolkit (translate-toolkit in Ubuntu) to change the msgids:

    poswap -i corrections.po -t ca.po -o ca.new.po

This does seem to lose header comments and obsolete strings from GNU gettext po files, but manually fixing those up is much less work than manually tweaking msgids in each translation (and could probably easily be scripted).

(Obviously, this should only be used in exceptional circumstances, where you're absolutely sure that none of the translators need the opportunity to re-review their translations.)

jtn
Very clever. Thank you JTN. Regards, Malcolm
Malcolm