views:

333

answers:

8

Let say I have a project that I have released under GPL, with the sources available to anyone. Later I find a very similar product, but as closed source, distributed binary-only by someone else.

Is there a good way to find out they are using my source code in their product?

If the solution is to somehow reverse-engineer the binary, is it possible to somehow automate it?

EDIT: Clarification. The bug hunt is one option, but not definitive, especially if the project is a library and the binary has added its own GUI, for example. The situation I'm interested is when its not blatantly obvious that the code is lifted.

+5  A: 

Bugs.

If the closed source release shares most of it's bugs with your project, it's probably 'lifted'.

You could also try decompiling your own binary with a decompiled version of the closed source binary... though this would probably not be reliable.

Alterlife
+1  A: 

You could try to disassemble both programs and compare the assembly, but if they used a different compiler then thier program could have minor differences. There are a few free disassemblers or a debugger could also step through in assembly.

Other than that there really isn't an easy way to find out that kind of thing.

thealliedhacker
+3  A: 

Obviously, if the suspected binary is not stripped, you can just look for any symbols that share the same name as your code's.

unwind
Sadly, the usual binary-only release has symbols and debug stripped :/
Tuminoid
+2  A: 

There's a large body of work on decompiling and reverse-engineering binary codes. The world expert is probably Cristina Cifuentes. She's done a lot with decompilation. It would also be interesting to write to Alex Aiken and ask if his tool for Measure o f Software Similarity could be adapted to binary codes.

Norman Ramsey
+2  A: 

An obvious method is to search for strings. run the unix strings tool and see if the binary contains any of the literal strings from your code. mainly stuff like error messages and text in messageboxes.

shoosh
Also a good reason to put many error messages in the code! :)
Aaron Digulla
A: 

The most surefire way I can think of is similar to the word 'Esquivalience' in the oxford dictionary.
Simply add some binary array with a unique content somewhere in the code and don't forget to make some simple use of it so the linker won't optimize it away. You should probably obfuscate it somewhat so that it will not be obvious to the casual reader that that it's redundant.
Then open the compiled binary with a hex editior and look for it.

shoosh
A: 

Why don't you look at the symbol table using nm?

$ nm a.out
...
yogman
+2  A: 

Look for Software Birthmarks. This method tries to establish links between software based on binary code or dynamic behavior. Christian Collberg is an expert on Software Watermarks, from which birthmarks were derived. This is all still in research land.

Christian Lindig
lindig sent a private message to me with a link to their paper, I assume that it could be shared here as well:http://www.st.cs.uni-sb.de/birthmarking/schuler-ase-2007.pdf
Tuminoid