views:

423

answers:

6

Sometimes when decompiling Java code, the decompiler doesn't manage to decompile it properly and you end up with little bits of bytecode in the output.

What are the weaknesses of decompilers? Are there any examples of Java source code that compiles into difficult-to-decompile bytecode?

Update:

Note that I'm aware that exploiting this information is not a safe way to hide secrets in code, and that decompilers can be improved in the future.

Nonetheless I am still interested in finding out what kinds of code foxes todays crop of decompilers.

A: 

Java keeps a lot of information in the bytecode (for instance many names). So it is relatively easy to decompile. Hard to decompile bytecode mostly is generated by hard to read sourcecode (so that's not really an option). If you really want to obfuscate your code, use a obfuscator, that renames all methods and variables to unrecognizable stuff.

Mnementh
+3  A: 

Any Java byte code that's been through an obfuscator will have "ridiculous" output from the decompiler. Also, when you have other languages like Scala that compile to JVM byte code, there's no rule that the byte code be easily represented back in Java, and likely isn't.

Over time, decompilers have to keep up with the new language features and the byte code they produce, so it's plausible that new language features are not easily reversed by the tools you're using.

Edit: As an example in .NET, the following code:

lock (this)
{
    DoSomething();
}

compiles to this:

Monitor.Enter(this);
try
{
    DoSomething();
}
catch
{
    Monitor.Exit(this);
}

The decompiler has to know that C# (as opposed to any other .NET language) has a special syntax dedicated to exactly those two calls. Otherwise you get unexpected (verbose) results.

280Z28
+1 for suggesting to decompile Scala to Java code
Jorn
A: 

Exceptions are often difficult to decompile. However, any code which has been obfuscated or has been written in another language is difficult to decompile.

BTW: Why would you want to know this?

Peter Lawrey
+1  A: 

The JDBC type-4 drivers for DB2 Connect are classics. Everything called one or two-letter names, irrelevant code that ends up having no effect, and more. I once tried to take a look to debug a particularly annoying problem and basically gave up. I'm hoping (but by no means confident) that this was passed through an obfuscator rather than the code actually looking like that.

Another favorite trick (although I can't remember the product) was to rename all objects to be constructed from the set {'0','O','l','1'}, which made reading it very difficult.

paxdiablo
I am unfamiliar with your set notation. Can you expand on this?
ojblass
I think he simply means that the alphabet used to name all objects, methods, variables, etc. only used those character which are hard to read and look identical in some typefaces.
Coxy
Yes, names like O01ll10, O01l1l0, OO1ll10 and O0l1l1O make it very hard to read (and make the names pretty well useless in terms of figuring out what they do).
paxdiablo
A: 

Java Bytecode does not correspond directly to Java constructions, so decompiling implies that you know that a certain java byte code sequence corresponds to a Java code construction.

The Soot framework for decompiling java byte code has a lot of information on this, but their webpage is down for me right now.

http://www.sable.mcgill.ca/soot/

Thorbjørn Ravn Andersen
A: 

Assuming you can decompile back to a reasonable style of source code (you can't always do that), what is hard to "reverse engineer" are algorithms that operate in unfamiliar problem domains. If you don't understand Fast Fourier transforms, it doesn't matter much if you can get back the code that implements an FFT Butterfly. (If this phrase is unfamiliar to you, I've already won if I encode one. If it is familiar to you, you are a pretty good engineer and probably don't have any interest in reverse engineering code). [Your mileage with North Koreans may vary.]

Ira Baxter