tags:

views:

483

answers:

8

Is it possible to create a file that will contain its own checksum (MD5, SHA1, whatever)? And to upset jokers I mean checksum in plain, not function calculating it.

+2  A: 

Certainly, it is possible. But one of the uses of checksums is to detect tampering of a file - how would you know if a file has been modified, if the modifier can also replace the checksum?

Mark Ransom
A: 

I don't know if I understand your question correctly, but you could make the first 16 bytes of the file the checksum of the rest of the file.

So before writing a file, you calculate the hash, write the hash value first and then write the file contents.

Philippe Leybaert
Although it's perfectly valid practical approach, I meant checksum that will include itself also
zakovyrya
I'm not a mathematician, but I think this is simply impossible
Philippe Leybaert
It isn't impossible, but it is very very difficult.
Lasse V. Karlsen
For CRC-32, it's actually quite simple. For a crypto hash, you'd be quite correct.
Steven Sudit
A: 

Sure, you could concatenate the digest of the file itself to the end of the file. To check it, you would calculate the digest of all but the last part, then compare it to the value in the last part. Of course, without some form of encryption, anyone can recalculate the digest and replace it.

edit

I should add that this is not so unusual. One technique is to concatenate a CRC-32 so that the CRC-32 of the whole file (including that digest) is zero. This won't work with digests based on cryptographic hashes, though.

Steven Sudit
+3  A: 

Yes. It's possible, and it's common with simple checksums. Getting a file to include it's own md5sum would be quite challenging.

In the most basic case, create a checksum value which will cause the summed modulus to equal zero. The checksum function then becomes something like

(n1 + n2 ... + CRC) % 256 == 0

If the checksum then becomes a part of the file, and is checked itself. A very common example of this is the Luhn algorithm used in credit card numbers. The last digit is a check digit, and is itself part of the 16 digit number.

brianegge
Right, that's what I said. :-)Since it's only 32 bits, it's entirely feasible to just brute-force the solution.
Steven Sudit
This does not show how to include the md5sum of a file within the file, which is what the question asked.
superjoe30
A: 

Sure.

The simplest way would be to run the file through an MD5 algorithm and embed that data within the file. You can split up the check sum and place it at known points of the file (based on a portion size of the file e.g. 30%, 50%, 75%) if you wish to try and hide it.

Similarly you could encrypt the file, or encrypt a portion of the file (along with the MD5 checksum) and embed that in the file. Edit I forgot to say that you would need to remove the checksum data before using it.

Of course if your file needs to be readily readable by another program e.g. Word then things become a little more complicated as you don't want to "corrupt" the file so that it is no longer readable.

ChrisBD
If you embed that data within the file, wouldn't that change the md5 checksum?
Eli
It would if you ran the checksum routine on it again, but that is the point of removing it before use. Simplest way would be to just add the checksum onto the end of the file. When the file is received you remove the checksum data and rerun the checksum routine on the remaining data. Any data corruption to either the checksum or the original data will show up here.
ChrisBD
I am fairly certain zakovyrya was asking for the checksum to be *included* in its own calculation.
tloflin
A: 

You can of course, but in that case the SHA digest of the whole file will not be the SHA you included, because it is a cryptographic hash function, so changing a single bit in the file changes the whole hash. What you are looking for is a checksum calculated using the content of the file in way to match a set of criteria.

bandi
A: 

There are many ways to embed information in order to detect transmission errors etc. CRC checksums are good at detecting runs of consecutive bit-flips and might be added in such a way that the checksum is always e.g. 0. These kind of checksums (including error correcting codes) are however easy to recreate and doesn't stop malicious tampering.

It is impossible to embed something in the message so that the receiver can verify its authenticity if the receiver knows nothing else about/from the sender. The receiver could for instance share a secret key with the sender. The sender can then append an encrypted checksum (which needs to be cryptographically secure such as md5/sha1). It is also possible to use asymmetric encryption where the sender can publish his public key and sign the md5 checksum/hash with his private key. The hash and the signature can then be tagged onto the data as a new kind of checksum. This is done all the time on internet nowadays.

The remaining problems then are 1. How can the receiver be sure that he got the right public key and 2. How secure is all this stuff in reality?. The answer to 1 might vary. On internet it's common to have the public key signed by someone everyone trusts. Another simple solution is that the receiver got the public key from a meeting in personal... The answer to 2 might change from day-to-day, but what's costly to force to day will probably be cheap to break some time in the future. By that time new algorithms and/or enlarged key sizes has hopefully emerged.

Markarian451
A: 

If the question is asking whether a file can contain its own checksum (in addition to other content), the answer is trivially yes for fixed-size checksums, because a file could contain all possible checksum values.

If the question is whether a file could consist of its own checksum (and nothing else), it's trivial to construct a checksum algorithm that would make such a file impossible: for an n-byte checksum, take the binary representation of the first n bytes of the file and add 1. Since it's also trivial to construct a checksum that always encodes itself (i.e. do the above without adding 1), clearly there are some checksums that can encode themselves, and some that cannot. It would probably be quite difficult to tell which of these a standard checksum is.

tloflin