views:

462

answers:

6

I was asked to write a Perl-to-Python and Perl-to-Ruby (and vice versa) code translator by an evil boss, and therefore I'm curious to know if some automatic code translators between them exist already. Just by googling it, I found Perthon, plus endless discussions and dead projects trying to do this.

In your opinion, is there a standard way for this translation? Apart from having a programmer expert in all languages manually doing the job :-)

Thanks!

+10  A: 

There is no good way to auto-translate between two languages, because it isn't just about changing the syntax, it's about changing the methodology.

For example, Ruby is an ultra o-o language, so it's not going to be a case of transforming some procedural script into Ruby syntax, it's a much bigger task than that.

Sohnee
+10  A: 

Perl is one of the hardest languages to parse. I don't think a real alternative parser for it exists anywhere except in the Perl source itself.

Your best chance is to walk around the problem in some way, because it's very hard.

If your boss is really serious and this is a job that must be done, learn about the Perl source and tack a Python/Ruby code generator behind the Perl interpreter front-end. It isn't simple but at least you won't have to write a Perl parser.

And regarding Perthon - it's hardly relevant. Python is far simpler to parser than Perl, and in fact its standard library includes a ready Python-code parser than generates ASTs from Python code.


Edit: seems to be a duplicate of http://stackoverflow.com/questions/1062026/are-there-programmatic-tools-for-perl-to-python-conversion/1062055#1062055

Eli Bendersky
Parser exists - PPI
Alexandr Ciornii
@Alexander: Where?
Ira Baxter
http://search.cpan.org/dist/PPI
daxim
That package is not a real parser, and Perl is impossible to parse:http://www.perlmonks.org/index.pl?node_id=44722The documentation on this page is clear about that:http://search.cpan.org/dist/PPI/lib/PPI.pm"When parsing Perl as code, you must also execute it""Even perl itself [...] doesn't "parse" Perl source into anything remotely like a structured document.""The purpose of PPI is not to parse Perl Code..."[but just Perl Documents, i.e. single source files - so it will sometime get the parsing wrong].
Blaisorblade
+6  A: 

the best translator there is, is the human being. No other ways comes close. Learn about those languages, and do manually translation.

A: 

"Only perl can parse Perl."

Leonardo Herrera
or you can use PPI parser
Alexandr Ciornii
Note the I in PPI. Stands for "isolated". it's not useful for automatic translation to other languages.
tsee
+3  A: 

You're not going to be successful, sorry. Your best bet (that I can think of) is to use PPI (http://search.cpan.org/dist/PPI/) and try to create an abstract syntax tree. Use that tree to translate to desired target.

It's a hard, hard problem and you're not going to get it right. If you don't have a lot of Perl code, then translating by hand is your best option.

Ovid
No, I think your best bet is to compile a perl with -DMAD enabled and then use that to generate something like an AST. I suspect the bitrotting MAD would need fixing first. I'm not even posting this as an answer because I think the original idea is so idiotic it's not even funny.
tsee
A: 

Building translators between languages is hard. Building good translations between langauges that have different paradigms is harder, but you can likely do it. (Its easy to see that an OO target can simulate a procedural source without much trouble. Less obvious is that a procedural target can simulate an OO source; after all the OO langauge itself is implemented likely in assembler which is procedural).

It is easier to do this if you have good translator infrastructure, such as the DMS Software Reengineering Toolkit, which is generalized compiler technology parameterized by langauge descriptions for both input and output, and by transformation rules that map structures in one langauge to the other.

DMS works by parsing source code in the original language into ASTs, applying AST-to-AST rewrites that produce ASTs in the target langauge, applying further AST-to-AST rewrites on the target trees to optimize the answer, and then anti-parsing ("prettyprinting") the final AST back to compilable source text.

It has strong parsers, and complete front ends for a variety of tough languages such as C, C++, Java, C#, COBOL, PHP and (just recently added) Python. The folk theorem that says C++ is almost impossible to parse is proven false by DMS. We've considered a Perl parser, and think the folk theorem that says there only Perl can parse Perl is similarly false. (Some might say, "so why don't you have it?" The answer is the other front end are not impossible but not easy either and it take awhile to get around to the less mainstream languages).

Building a translator requires effort, too, but really spectacular ones have been built using DMS. See the B-2 Stealth Bomber mission software translator we built with DMS.

Even with this background, I'd hesitate before I took on a Perl to Python translator.

Ira Baxter
C++ is difficult to completely parse, and it's not LALR(1) parsable. But Perl is not parsable at all, as PPI docs point out (you can parse maybe 99% of existing Perl code, but you're doomed to either fail or have to execute Perl code, getting thus an undecidable parsing problem).
Blaisorblade
@Blaisorblade: C++ is easy to parse if you use the right parsing technology, and you're right, LALR(1) isn't that technology. We use GLR parsers and they operate just fine using an explicit C++ grammar. Now, there's a folk theorem that says PERL isn't parsable, but if that were true, then PERL itself wouldn't work. My suspicion is that PERL is parsable just fine, but inducing the grammar rules may be awkward; GLR parsers make it far easier to experiment with grammars. That makes build a PERL parser likely inconvenient, but hardly impossible.
Ira Baxter
@Blaisorblade: I've been to see the "impossible to parse Perl" proof. My interpretation of that document is that phrases in Perl can have multiple interpretations, and which one is "right" depends on runtime context. That's awkward but not disabling; it just says the the "parser" has to capture the multiple possible interpretions, and then something (whether you call it part of the parser or not) should attempt to eliminate those intepretations which are not possible for this program (called "static analysis"). (continued...)
Ira Baxter
@Blaisorblade: This problem isn't new. COBOL (a language from 1958) allows the use of unqualified identifiers (e.g., so that "X" arguably could refer to several possible definitions of "X") with the proviso that your program isn't legal if there is more than one interpretation. And COBOL parsers have been around for a long time. I'll grant you that the Perl situation is nastier and the resultant parser may say "well, I have 6 possible interpretation of this phrase". That only stops you from building a parser that produces one interpretation.
Ira Baxter
Your idea is nice and interesting, but with the usual definition of parser (something producing one parse tree, i.e. one interpretation), the theorem "perl is unparsable" still holds. Just make the program search for odd perfect numbers and have different parsing behavior depending on the result.If you have multiple parse trees, fine, it can work, but it would be annoying that the whole rest of the parse tree could in general depend on the interpretation (dunno if this happens in Perl).
Blaisorblade
@Blaisorblade: If you insisted on the "one parse tree" requirement, it would be impossible to parse C++, too, where a variety of phrases have multiple possible parses (which depend on the meaning of symbols defined elsewhere). What we do with DMS for C++ is collect *all* the potential parses during parsing (there's a way to integrate thus use subtree sharing that makes this practical), and then use a symbol collection phase to eliminate the ones that cant' work. If a Perl phrase doesn't have an infinite number of parses, it should be possible to do this with Perl, too.
Ira Baxter
> multiple possible parses (which depend on the meaning of symbols defined elsewhere)Have the definitions to be before the dependent code? If yes (as is usual in C++ compared to Java), the usual solution is this: http://en.wikipedia.org/wiki/The_lexer_hack.That page mentions C++ - is it about the same problem?The difference between C++ and Perl is that in Perl you need to execute code before being able to disambiguate; while in C++ you just need to parse everything. The solution you describe would probably allow to parse code fragments without preprocessing, so it's very cool for an IDE.
Blaisorblade
The lexer hack is the classic YACC+lexer solution to parsing C and C++. Its a hack. Parsing and producing multiple-but-ambiguous syntax trees with a later pass to clean out the ambiguities means that parser has no dependency on symbol order; the result is a really clean parser definition and remarkably a really clean symbol table builder for C++. We use the same machinery to parse some 25 other langauges in that same clean way.
Ira Baxter