tags:

views:

91

answers:

6

Hello:

I'm developing a program that needs to load and save data in external files, I have been searching for options and I have chosen to save the data in a binary file.

As I don't want that someone could edit the file easily, I thought about writing in the first line of the file, its md5 sum. In this case, if some data of the file is changed, the sum won't match the one of the first line.

The problem I find then is that if I calculate the MD5, and after that I write the info inside the file, it's obvious that the sum will be different, so, how could I sort this?

If you sugest me a better option than the sum, it will be equally accepted.

Thanks in advance.

A: 

you could store the MD5sum in a database instead, then when you want to see if a file has been changed you check the MD5 sum in the db. alternatively you could store the md5sum of a file in another file.

JiminyCricket
why the down votes? please explain why this option wont work or how it doesnt answer the question
JiminyCricket
This doesn't work because someone can merely change the file and update the database with the new digest.
brian d foy
+3  A: 

While it is theoretically possible to make a self-referencing MD5 file (and I recall some have been found), it's a waste of resources. It is generally necessary to store the hash somewhere outside the hashed file (traditionally named md5sums or sha1sums, respectively).

This said, I'd recommend going for SHA-1 in addition to MD5.

Piskvor
+1 - MD5 must go ... http://stackoverflow.com/questions/2768248/is-md5-really-that-bad
nonnb
even SHA-1 is [broken](http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html) (fsvo broken) these days.
Philip Potter
@nonnb: well, it's just about OK if you only need to prevent accidental file corruption; but yeah, I consider it deprecated.
Piskvor
@Philip Potter: As you said, it depends on the threat model - if you're worried about intentional tampering, then the checksum might be modified also.
Piskvor
@Piskvor: quite so. if intentional tampering might change the checksum, then pretty much *any* checksum is of equal security value, because finding a preimage is no longer the path of least resistance. Might as well use CRC.
Philip Potter
+7  A: 

What is your threat model?

If you just want to protect against casual fiddling, md5 the main data of the file, then write the md5 sum to the end. To validate, strip off the md5 sum, then md5 only the original file.

If you want to protect against malicious and skilled cracking, you're out of luck; any validation algorithm you use can be replicated, particularly if they have access to the program itself. Even a cryptographic signature could fail if the attacker extracts the key from the program binary.

If it's a big deal, a unix solution is to run as setuid or setgid to a different user and write to a directory which users cannot modify. I'm not sure what a good general Java solution is, but the point remains: users shouldn't be able to modify your data because they were prevented from doing so, not because they were detected trying to.

Philip Potter
I like this answer best. +1
Randolpho
+1 for "you're out of luck against crackers"
snemarch
A: 

Bill: Ted, while I agree that, in time, our band will be most triumphant. The truth is, Wyld Stallyns will never be a super band until we have Eddie Van Halen on guitar.

Ted: Yes, Bill. But, I do not believe we will get Eddie Van Halen until we have a triumphant video.

Bill: Ted, it's pointless to have a triumphant video before we even have decent instruments.

Ted: Well, how can we have decent instruments when we don't really even know how to play?

Bill: That is why we NEED Eddie Van Halen!

Ted: And THAT is why we need a triumphant video.

Bill, Ted: EXCELLENT!

Seriously, you can't calculate the MD5 sum (or some other hash) with the calculated hash embedded, so you have to store the hash somewhere else.

If you just don't want people to easily mess with the file, maybe it's an option to obfuscate it via ROT13 or XOR "encryption" ?

DarkDust
Actually, you *can* with MD5, because it's so broken :)
Philip Potter
A: 

Just ignore the first line when you compute the md5. You should also add a secret salt to make sure it's not to easy to create a new MD5 after editing the content. It depends on your actual need (level of security).

gawi
The salt doesn't have to be secret. The purpose of a salt is to prevent precomputed tables of hash values by increasing the required size of such tables.
Philip Potter
if its editable, someone could just mess with the md5 hash up top or they could add some other crap at the top so that the first line is no longer the md5 hash, making this method of putting it at the top or bottom not useful
JiminyCricket
If the salt is not secret, it's pretty easy to recreate the MD5 hash after the file is modified. It's just less obvious if someone must decompile de bytecode in order to find the salt value. This is OK if you just want to prevent casual users from editing your files. Otherwise, forget it, there's always a way to bypass a java method execution (unless code is running on server-side).
gawi
The term "salt" is normally used to refer to an appendage which is random but known, to make it so the same "main" data won't always yield the same hash string. Here, I think the goal is to munge the data slightly before computing the hash, otherwise compute the hash in a slightly-nonstandard way.
supercat
@supercat Yes. I agree. The salt word was misused there.
gawi
+1  A: 

What if you create a container for your data? Through a new class with two properties, CheckSum and Data, you could serialize all your data and put it in the Data property. Then, you calulate the checksum for the serialized data, and use the CheckSum property to store the checksum.

Christian Nesmark