views:

23

answers:

1

Using Microsoft FCIV which computes SHA-1 file checksums, I created a text file with file names and checksums:

"8697c58c606122c30e2a20f1eabd6919" "g:\00258\99481\99481.eps"
"b77a6b392c002bb9cc51f48170487dea" "g:\00258\99481\99481.eps"

My intent is to create a Jpeg thumbnail for any images that change. However, this utility takes hours to generate a list. I wanted to use SHA-1 because the Git folks find it useful (1 in 2^52 chance of collision, 5 commas). MD5 produces several collisions with that sample size. I want to use the SHA-1 as a unique identifier too.

I need to quickly identify file changes and re-generate thumbnails only for changed files. I would like to get these values in to SQL. Any suggestions? (For that matter, I need to read the image loading keywords in to SQL). Time is difficult because twice a year, Microsoft's file creation and modification times change by an hour.

+1  A: 

Why don't you look at the file modification time as a first step and then if that's different do a hash. That way you won't be doing the (expensive) hash for every file.

You could also look at the file size as an additional check.

Also you could regenerate all the hash twice a year when the clocks change.

Matt Warren
I guess you could store the previous directory list and select files that have a different time not exactly 3600 seconds from the current list. The time will change both ways, but only by exactly 3600 seconds. However, clocks are set remotely via NTS. In the past, we end up always resyncing the entire directory structure even though we specifically check for these time changes.I was thinking the SHA-1 would never change, but it's so expensive to produce it. MD5 is not specific enough based on our tests.
Dr. Zim
The other thing you could do would be to look at just the first few bytes of the image. For instance with .jpg files the header contains all sorts of information such as the dimension, bit-depth, colour information etc.You could see if any of this had changed as another step before you calculate the SHA-1 hash.
Matt Warren