views:

42

answers:

3

There are some tasks where the obvious language choice is with scripting languages: bash, Python, Ruby, Tcl... however I find it hard to protect a company IP once the product is delivered because the application is never compiled. The client will have complete access to every single line of code.

Which one are the choices to protect a product IP when it is best implemented with scripting languages? (switching to a compiled language such as C++ should not be an option)

I know that some interpreted languages can be compiled, but there are cases where the process can be inverted

+1  A: 

If the code is being run on somebody else's machines, they will always be able to deconstruct it to see how it works. It's just a matter of how much effort you want them to put into it. Compiled languages make it harder, but not impossible.

If you are adamant about not revealing what is going on behind the curtains, consider using a client-server model where the IP is contained on a server that you control.

rayners
A: 

A lot of the scripting languages out there support compiling your scripts into "bytecode" which is an interpreter-specific internal representation of the source code used to run the program.

I know Perl and Python support it. Shell scripts are another story, as most shells execute the script line-by-line and do not internally compile the script. But since shell scripts are mainly used as a "glue language" it's kind of pointless to compile it.

There are also some issues associated with compiling a script into bytecode, too; it can ruin the cross-platform nature of the script by compiling it for a native architecture. For instance, if I compile a Perl script with an integer variable that will only fit in a 64-bit register, on a x86_64 machine, chances are that there will either be an error or a loss of precision if the script is run on an i486 machine.

amphetamachine
+1  A: 

Gimpel software used to distribute its products in "obfuscated source form." One great obfuscation is to rename every variable to have a 16-character name composed solely of lower case ell's (l) and numeral one's (1), thus l111ll11ll1l1l11. In the font that SO uses, O0OOO0O00OO000 is even harder to distinguish from similar variables.

You can also look into Christian Collberg's (University of Arizona) work on code obfuscation. Collberg introduces if statements which always go the same way, but figuring this out (and figuring out which way) requires solving an NP-hard problem. This technique has a run-time cost but if you are really keen to protect your IP it might be worth it.

Norman Ramsey