views:

144

answers:

6

I have some code that is written in french, that is the variables, class, function all have french names. The comments are also in french. I'd like to translate the code to english. This will be quite a challenge, since it's a 18K lines project and I'd like to know if there is any tool that could help me, especially with the variables/class/function names, since it will be error prone to rename them all.

Is there any tools that can help me? Advices?

edit : I'm not looking for machine translation. I'm looking for a tool that would help me translate the code. Let's say there is class name C and this class has a method named TraverserLaRue and I rename it CrossTheRoad I'd like all references to TraverserLaRue in all files to be translated as CrossTheRoad. However I don't want the method TraverserLaRue of class B to be translated.

A: 

I did this with German code a while ago, but had mixed results because of abbreviations in names, etc. Using regular expressions, I wrote a parser that removed all of the language specific keywords and characters, then separated comments from the rest of the code, and now I had a lot of words that didn't necessarily mean anything to me by themselves. So I wrote a unique word finder that added them all to a ordered text file. Next stop was Google's language tools that attempted to translate every word in the list. I ran through the list to see if each word really translated, and if it did, I did a replace all in the code with the english equivalent. The comments I put back in with the complete translation, if it worked. What I found was that I ended up having to talk with someone who understood "Germish" to translate the abbreviations, slang terms, and mixed language pieces. So in short, regular expressions with a dictionary, unless someone has a real tool for this, which I would be interested in also.

Jeff
+2  A: 

Any refactoring tool has a rename feature. Many questions on SO address language specific refactoring tools.

For the comments, you will have to handle them manually.

mouviciel
A: 

You should definitely look into https://launchpad.net/rosetta

Ubuntu uses this to translate thousands of its packages written in hundreds of programming languages into hundreds of human languages, with updates for each new version. Truly herculean task.

edit: ...to clarify how Rosetta is used at Ubuntu: it modifies all natural language strings occuring in source code of the open-source apps, creating a language-specific source packages, which upon compiling create given kinds of binaries. Of course it does not edit binaries themselves.

First maintainers create "template files" which are something like "Patch with wildcards" - a set of rules what and where in the source tree needs to be translated, but not to what. Then Rosetta displays strings to be translated, and allows volunteering translators to provide translations to their language for each entry. Each entry can be discussed, modified, suggestions submitted and moderated. Stats are provided how much needs to be translated, which translations are unsure, which are missing etc. When the translation is complete, patch of given language is applied to the source creating its version for given language. Then a distribution is compiled from the modified sources.

This allows translation both for sources that use some external resources for multilingual allowing for language change on the fly, and for ones that have literal native language strings right in the source code, mixed with business logic.

When a new version of the package is released, template must be edited to include all new strings but it has quite good automation for preserving the existing ones. Of course only translations for new strings are required.

SF.
I don't want to translate the resulting application. I want to translate the source code itself.
Mathieu Pagé
A: 

IMHO automatic tools won't be of any help here. Just translating variable and function names is not enough and will make the code worse because they cannot infer the original programmer intent when he choose a variable name.

Depending on what programming language this code is written to there are modern IDEs that might ease the refactoring but if you want to have good results manual code review is a must.

Darin Dimitrov
I'm not looking for a machine translation tool. I'm looking for a refactoring tool that would help me rename symbols in a project.
Mathieu Pagé
A: 

A good IDE will be able to list classes, methods, variables. There's also documentation generation tools that'll do that such as Javadoc for Java, Doxygen for many languages, etc.

To do the actual translation, there will be no tool that will perform well, or even to a satisfactory level. The only way to get something worthwile is to have a bilingual translator translate the terms. I've been doing freelance translations for many years, and can tell you that trying to have some machine do the translating is a waste of time. Many examples, choice of words, will be relevant to your culture and not the other. And that's just the tip of the iceberg.

Unless you find someone that can do the translation, I suggest you abandon the idea. Leave the source code as is. If a non-French speaker reads it, and needs to understand something, let them do the Google lookup. If they are native English speakers they'll probably do a better job of understanding the automatic translated stuff than you would, being French. When translating, you always want to translate into your native language.

JRL
+2  A: 

I assume the langauge in question is one of the common ones, such as C, C++, C#, Java, ... (You don't have a language with French keywords? I once encountered an entirely Swedish version of Pascal, and I gave up on working that).

So you have two problems:

  • Translating identifiers in the source code
  • Translating comments

Since comments contain arbitrary natural language text, you'll need an arbitrary translation of them. I don't think you can find an automated tool to do that.

Unlike others, however, I think you have a decent chance at translating the identifiers and changing them en masse.

SD makes a line of source code "obfuscator" products. These tools don't process the code as raw text, rather they process the source code in terms of the targeted language; they accurately distinguish identifiers from operators, numbers, comments etc. In particular, they operate reliably as need on just the identifiers.

One of the things these tools do is to replace one identifier name by another (usually a nonsense name) to make the code really hard to understand. Think abstractly of a map of identifier names I -> N. (They do other things, but that's not interesting here). Because you often want to re-obfuscate a file that has changed, the same way as an original, these tools allow you to reuse a previous cycle's identifier map, which is represented as list of I -> N pairs.

I think you can abuse this to do what you want.

Step 1: Run such an obfuscator on your original French code. This will produce a text file containing all the identifiers in the code as a map of the form

  I1 -> N1
  I2 -> N2
  ....

You don't care about the Ns, just the I's.

Step 2: Manually translate each French I to an English name E you think fits best. (I have no specific suggestions about how to do this; some of the other answers here have suggestions). Some of the I's are likely to be library calls and are thus already correct. You can modify the text obfuscation map file to be:

  I1 -> E1
  I2 -> E2

Step 3: Run the obfuscation tool, and make it use your modified obfuscation map. It can be told to do that.

Viola, all the identifiers in your code will be changed the way you specify.

[You may get, as a freebie, the re-formatting of your original text. These tools can also format code nicely. Your name changes are likely to screw up the indentation/spacing in the original text so this is a nice bonus].

Ira Baxter
That's exactly what I was looking for, unfortunately the product cost 750$, I'll look if I can't find other obfuscator with this feature.
Mathieu Pagé