views:

275

answers:

4

How do I remove strings from / obfuscate a compiled binary? The goal is to avoid having people read the names of the functions/methods inside.

It is a dynamic library (.so) compiled from C++ code for Android with the NDK tools (includes GCC)

I compile with -O3 and already use arm-eabi-strip -g mylib.so to remove debugging symbols, but when I do strings mylib.so all the names of the functions/methods are still readable.

A: 

They are unavoidable. Those strings are the means by which the loader links shared libraries at runtime.

Marcelo Cantos
I am not talking about the exported functions to access the library, but the internal functions. Is it unavoidable to show them?
Stéphane
Ah, then your `arm-eabi-gcc` command-line is confusing me. The `-g` option usually _adds_ debug symbols.
Marcelo Cantos
sorry for the typo, it is not `gcc` but `strip`... let me correct this.
Stéphane
+4  A: 

There are some commercial obfuscators which accomplish this. Basically, they re-write all of the symbols on the go. Something like this:

void foo()

becomes

void EEhj_y33() // usually much, much longer and clobbered

Variable names are also given the same treatment, as are members of structures / unions (depending on what level of obfuscation you set).

Most of them work by scanning your code base, establishing a dictionary then substituting garbled messes for symbol names in the output, which can then be compiled as usual.

I don't recommend using them, but they are available. Simply obfuscating meaningful symbol names is not going to stop someone who is determined to discover how your library / program works. Additionally, you aren't going to be able to do anything about someone who traces system calls. Really, what's the point? Some argue that it helps keep the 'casual observer' at bay, I argue that someone running ltrace strace and strings is typically anything but casual.

Unless you mean string literals , not symbols ? There's nothing you can do about them, unless you store the literals in an encrypted format that you code has to decrypt before using. That is not just a waste, but an egregious waste that provides no benefit whatsoever.

Tim Post
Thanks! Obfuscation is better than nothing ; my code consists mostly in complex algorithms, and their implementation and tuning, which are somewhat given away by the names of the functions, makes a lot of the software's value. On the other hand, leaving the system calls identifiable is not a problem.
Stéphane
+5  A: 

These strings are in the dynamic symbol table, which is used when the library is loaded at runtime. readelf -p .dynstr mylib.so will show these entries.

strip -g will remove debugging symbols, but it can't remove entries from the dynamic symbol table, as these may be needed at runtime. Your problem is that you have entries in the dynamic symbol table for functions which are never going to be called from outside your library. Unless you tell it, the compiler/linker has no way of knowing which functions form part of the external API (and therefore need entries in the dynamic symbol table) and which functions are private to your library (and so don't need entries in the dynamic symbol table), so it just creates dynamic symbol table entries for all non-static functions.

There are two main ways you can inform the compiler which functions are private.

  1. Mark the private functions static. Obviously, this only works for functions only needed within a single compilation unit, though for some libraries this technique might be sufficient.

  2. Use the gcc "visibility" attribute to mark the functions as visible or hidden. You have two options: either mark all the private functions as hidden, or change the default visibility to hidden using the -fvisibility=hidden compiler option and mark all the public functions as visible. The latter is probably the best option for you, as it means that you don't have to worry about accidentally adding a function and forgetting to mark it as hidden.

If you have a function:

int foo(int a, int b);

then the syntax for marking it hidden is:

int foo(int a, int b) __attribute__((visibility("hidden")));

and the syntax for marking it visible is:

int foo(int a, int b) __attribute__((visibility("default")));

For further details, see this document, which is an excellent source of information on this subject.

jchl
Great answer! I need to test that... The how-to is great too, but is quite a long read. Thanks a lot!
Stéphane
+1  A: 

Assuming you are correctly specifying a hidden visibility to g++ for all of your source files (as other posters have recommended), there's a chance you might be running in to this GCC bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38643

Try dumping the symbols in your binary that are showing up (readelf -Wa mylib.so | c++filt | less); if you see only vtable and VTT symbols after demangling, then the gcc bug might be your problem.

Edit: if you can, try GCC 4.4.0 or later, as it appears to be fixed there.

Peter Jensen
-fvisibility=hidden in GCC (V4.4.0) + arm-eabi-strip -s work.thanks
Stéphane