views:

74

answers:

2

I have an application that records data from a manufacturing process on a periodic basis (various sample rates, minimum of 1 sec, the usual max is 10 min or more). The customer would like to know if the data has been altered (changed in place, records added to, or records deleted from).

The data is recorded as a binary record. There can be multiple streams of data, each going to its own file, and each with its own data format. The data is written a record at a time, and if the monitoring PC or process goes down, manufacturing does not necessarily stop, so I can't guarantee the archiving process will stay up. Obviously, I can only authenticate what I actually record, but the recording might start and stop.

What methods can be used to authenticate that data? I'd prefer to use a separate 'logging' file to validate the data to maintain backwards compatibility, but I'm not sure that's possible. Barring direct answers, are there suggestions for search terms to find some suggestions?

Thanks!

A: 

I don't think you necessarily need digital signatures, secure hashes (say, SHA-256) should be sufficient.

As each record is written, compute a secure hash of it, and store the hash value in your logging file. If there's some sort of record ID, store that as well. You need some way to match up the hash with the corresponding record.

Now, as long as no one tampers with the logging file, any alteration of the records will be detectable. To make tampering difficult, periodically hash your log file and send that hash and the number of records in the log file somewhere secure. Ideally, send it multiple places, each under the control of a different person.

A slightly more sophisticated approach is to use a Merkle tree, essentially a binary tree of hashes, rather than just a single hash of the log file. Then store the whole tree (which isn't very large) and send the "root" hash to various places. The root hash allows you to verify the integrity of the tree and the tree allows you to verify the integrity of the log file -- and if the integrity check fails, it also enables you to determine which records were modified.

swillden
This is essentially what I did, thanks! I have a record that specifies the start location in the data file and length of data, and then the hash of that data (and start/len). (the length of the record and hash are also specified - oh and if the file is tainted: someone legitimately edited a tampered file). The .digest is just a list of those records.
A: 

You could look at digital timestamping instead. GuardTime has the technology to support massively scalable 1sec precision timestamping which guarantees information integrity.

martin
Interesting. A bit of overkill for my application, but a clever use of the old dead tree edition.