views:

483

answers:

3

Many of my company's clients use our data acquisition software in a research basis. Due to the nature of research in general, some of the clients ask that data is encrypted to prevent tampering -- there could be serious ramifications if their data was shown to be falsified.

Some of our binary software encrypts output files with a password stored in the source, that looks like random characters. At the software level, we are able to open up encrypted files for read-only operations. If someone really wanted to find out the password so that they could alter data, it would be possible, but it would be a lot of work.

I'm looking into using Python for rapid development of another piece of software. To duplicate the functionality of encryption to defeat/discourage data tampering, the best idea I've come up with so far is to just use ctypes with a DLL for file reading/writing operations, so that the method of encryption and decryption is "sufficiently" obfuscated.

We are well aware that an "uncrackable" method is unattainable, but at the same time I'm obviously not comfortable with just having the encryption/decryption approaches sitting there in plain text in the Python source code. A "very strong discouragement of data tampering" would be good enough, I think.

What would be the best approach to attain a happy medium of encryption or other proof of data integrity using Python? I saw another post talking about generating a "tamper proof signature", but if a signature was generated in pure Python then it would be trivial to generate a signature for any arbitrary data. We might be able to phone home to prove data integrity, but that seems like a major inconvenience for everyone involved.

+3  A: 

If you are embedding passwords somewhere, you are already hosed. You can't guarantee anything.

However, you could use public key/private key encryption to make sure the data hasn't been tampered with.

The way it works is this:

  1. You generate a public key / private key pair.
  2. Keep the private key secure, distribute the public key.
  3. Hash the data and then sign the hash with the private key.
  4. Use the public key to verify the hash.

This effectively renders the data read-only outside your company, and provides your program a simple way to verify that the data hasn't been modified without distributing passwords.

Christopher
That wont work -- you need the private key to sign the hash value, and you said that wasn't distributed.
Martin Geisler
Or at least I read your recipe like that :-) If you instead propose to send the hash value to a server where the private key is stored, then you solve the distribution problem. But the signature wont mean much then, since everybody can ask the server to issue one.
Martin Geisler
Yes, but "hiding the keys" is not semantically any different. The only way you can make strong claims about the data is to have the user sign a hash of the data with a key specifically allocated to that user. The hash and the signature are stored in a secure area. Later, if there is any doubt about the data, you can check the signature stored in your secure repository against the data provided. Yes, researchers could request a signature for any data they want. But that is no different than the way your system works now. Security by obscurity is no security at all.
Christopher
+11  A: 

As a general principle, you don't want to use encryption to protect against tampering, instead you want to use a digital signature. Encryption gives you confidentiality, but you are after integrity.

Compute a hash value over your data and either store the hash value in a place where you know it cannot be tampered with or digitally sign it.

In your case, it seems like you want to ensure that only your software can have generated the files? Like you say, there cannot exist a really secure way to do this when your users have access to the software since they can tear it apart and find any secret keys you include. Given that constraint, I think your idea of using a DLL is about as good as you can do it.

Martin Geisler
OTOH, it is ultimately more the client's responsibility to not alter the data, than it is our responsibility to prevent tampering. I'm not sure if the bosses/clients would be keen on having per-user private keys for signing data they have acquired, but at least there's a dozen ways to do that.
Mark Rushakoff
It would be foolish to have your company make any claims about software's ability to guarantee that the data has not been falsified. Extracting private keys or passwords, even from a .dll, is not that hard. Any sufficiently motivated person with a debugger and a couple of hours could do it.
Christopher
Also, it doesn't matter if your passwords look like random data. In fact, that can be a problem. .exe and .dll files are *not* random. If you were to analyze the files for suspiciously random data, they'd probably find your keys straight off.
Christopher
A: 

Here's another issue. Presumably, your data acquisition software is collecting data from some external source (like some sort of measuring device), then doing whatever processsing is necessary on the raw data and storing the results. Regardless of what method you use in your program, another possible attack vector would be to feed in bad data to the program, and the program itself has no way of knowing that you are feeding in made up data rather than data that came from the measuring device. But this might not be fixable.

Another possible attack vector (and probably the one you are concerned about is tampering with the data on the computer after it has been stored. Here's an idea to mitigate that risk: set up a separate server (this could either be something your company would run, or more likely it would be something the client would set up) with a password protected web service that allows a user to add (but not remove) data records. Then have your program, when it collects data, send it to the server (using the password/connection string which is stored in the program). Have your program only write the data to the local machine if it receives confirmation that the data has been successfully stored on the server.

Now suppose an attacker tries to tamper with the data on the client. If he can reverse engineer the program then he can of course still send it to the server for storage, just as the program did. But the server will still have the original data, so the tampering will be detectable because the server will end up with both the original and modified data - the client won't be able to erase the original records. (The client program of course does not need to know how to erase records on the server.)

Alex319