views:

58

answers:

3

Hi,

Has anybody come across a situation where an existing code-base written (say) in Java and written by (say) French programmers had to be converted to code that English speaking programmers could understand? The problem here is that variable/method/class names, comments etc would all be in that particular language.

Is there any any automated solution available already?

(I used the word Translate in title but obviously I don't mean porting the code to any other programming language neither do I mean i18n.)

Regards,

A: 

I don't think you can simply run this through some sort of "translation" software to do a dictionary-based replacement of the variable names and comments. I'm afraid you'll either need a translation software that does parse Java to the extent when it can separate out the comments, variable names, class names and potentially the message and only then apply a dictionary-based translation. Even in that case I doubt that the result will be very appealing given that said software is most likely lacking the domain knowledge that you'd need to idiomatically translate the terms.

I'm afraid the only solution that is going to produce something useful is to engage a programmer who is fluent in both natural languages and is familiar the problem domain to rewrite the software. Everything else is likely to create a big mess.

Timo Geusch
A: 

And remember that class path names are very sensitive in Java (and some other languages), doing a global "find and replace", which is what this sounds like, would most probably break important aspects of the software.

warren
+1  A: 

Well this certainly is a non trivial task.

My first idea was

  • Get some tool (parser) which parses your source code into an XML file (or an AST)
  • Do the translations on that intermediate format as you e.g. can use XPATH in the XML file to find the comments, variable names, etc.
  • Then the tool of course must support reconverting the XML file to javasource code

Problems:

  • Bad translations (translation program has no domain knowledge, translation program almost surely isn't able to translate computer/programming terms correctly, acronyms, misstyped words, camelcase method names etc.)
  • You can't just blindly translate ideally you would need to refactor. As else you might end up with source code which isn't valid anymore because (a: the translation matches several words to a single translation which could end up with classes/variables/methods having the same etc.
  • How to determine what not to translate (e.g. java standard library class names and so)
jitter