Checksum for multipage tiff document | ansaurus

tags:

views:

65

answers:

1

Q:

Checksum for multipage tiff document

I want to calculate the checksum for a large tiff file that might not fit in memory. Will I get a reliable value if I instead calculate the checksum of every page and then calculate the checksum of the array of page checksums or will I run into a mathematical problem that I am not seeing and the only correct way to do it is to in fact work with the whole thing?

Thanks!

A:

I don't know if understood the question correctly, but with most checksum algorithms you only have to load a small part of the message to memory. Because of that operating on the streams instead of memory locations is possible and has been done before.

Edit:

I only know that you have to be careful with Adler-32 when checksumming short messages, you would not be covering the whole hash space and false positives are more likely (yest, the array of checksums would probably be a short message).

With crypto hashes I honestly don't know. My intuition is that md5(msg1 + msg2 + ...) is as reliable as md5(md5(msg1) + md5(msg2) + ...) but we'll have to wait for someone smarter than me to give definitive answer :)

wuub 2009-07-15 15:06:34

Thank you for the link - but how far off would I be if I calculate the chechsum of the page checksums, instead of the checksum of the whole multipage document?

Otávio Décio 2009-07-15 15:25:33

related questions

Preventing fraudulent submission to a scoreboard

Why do downloads for various projects have hashcodes or checksums?

Directory checksum with python?

How can I calculate the checksum of code at runtime?

How to get checksums for strided patterns

Create an aggregate checksum of a column

CRC Calculation

imap - how to validate that complete message was transferred?

What should I use as a check digit algorithm for a base 31 value?

libnet that properly calculates checksum on IPV6

What is the best way to calculate a checksum for a file that is on my machine?

Hash Code and Checksum - what's the difference?

Good choice for a lightweight checksum algorithm?

Determining CRC algorithm from data + CRC - embedded application.

Calculate checksum of audio files without considering the header

How could I guess a checksum algorithm?

Clients to upload data to central server using WCF, how to verify data was uploaded?

CRC checks for files

JPEG built-in checksum / fingerprint?

How could I guess a checksum algorithm?

Robust and fast checksum algorithm?

How do I remotely get a checksum for a file on a Windows machine?

What, if any, checksum is used for TNT.com tracking numbers?

How to generate a verification code/number ?

why are downloads sometimes tagged md5, sha1 and other hash indicators?