tags:

views:

878

answers:

7

I need to calculate a summary md5 checksum for all files of a particular type ( *.py for example ) placed under a directory and all subdirectories. What is the best way to do that? Thanks.

The proposed solutions are very nice, but this is not exactly what I need. I'm looking for a solution to get a single SUMMARY checksum which will uniquely identify the directory as a whole - including content of all its subdirs. Thanks a lot.

A: 

Take a look at this and this for a more detailed explanation.

luvieere
A: 

From the second link (above): find directory -name *.py -type f -exec md5sum {};

ZaiS
+1  A: 

If you want one md5sum spanning the whole directory, I would do something like

cat *.py | md5sum
Ramon
Great, but doesn't include those of subdirectories content...
victorz
For subdirs use something like `cat **.py` | md5sum
Ramon
+1  A: 

GNU find

find /path -type f -name "*.py" -exec md5sum "{}" +;
ghostdog74
Should the last token be *\;*?
Dan Moulding
its valid for GNU find. check the man page for more info.
ghostdog74
+6  A: 

Create a tar archive file on the fly and pipe that to md5sum:

tar c dir | md5sum

This produces a single md5sum that should be unique to your file and sub-directory setup. No files are created on disk.

ire_and_curses
+1, simple and elegant
Adam Rosenfield
+1  A: 
find /path/to/dir/ -type f -name *.py -exec md5sum {} + | awk '{print $1}' | sort | md5sum

The find command lists all the files that end in .py. The md5sum is computed for each .py file. awk is used to pick of the md5sums (ignoring the filenames, which may not be unique). The md5sums are sorted. The md5sum of this sorted list is then returned.

I've tested this by copying a test directory:

rsync -a ~/pybin/ ~/pybin2/

I renamed some of the files in ~/pybin2.

The find...md5sum command returns the same output for both directories.

2bcf49a4d19ef9abd284311108d626f1  -
unutbu
This one really works as expected! Thanks a million!
victorz
A: 

Technically you only need to run ls -lR *.py | md5sum. Unless you are worried about someone modifying the files and touching them back to their original dates and never changing the files' sizes, the output from ls should tell you if the file has changed. My unix-foo is weak so you might need some more command line parameters to get the create time and modification time to print. ls will also tell you if permissions on the files have changed (and I'm sure there are switches to turn that off if you don't care about that).

jmucchiello