ansaurus

Question

Answer 1

+1 A:

Even beter:

md5sum *.java | sort | uniq -d

That only prints the duplicate lines.

Zsolt Botykai 2009-03-07 11:27:31

The code gives me nothing as an output. The reason may be that although two files have the same hash, they still have different names. We should first somehow filter the names out.

Masi 2009-03-07 11:45:36

Answer 2

+4 A:

This should work:

md5sum *.java | sort | uniq -d -w32

This tells uniq to only compare the first 32 character, which is only the md5 sum, not the filenames.

EDIT: If -w isn't available, try:

md5sum *.java | awk '{print $1}' | sort | uniq -d

The downside is that you won't know which files have these duplicate checksums... anyway, if there aren't too much checksums, you can use

md5sum *.java | grep 0bee89b07a248e27c83fc3d5951213c1

to get the filenames afterwards (the checksum above is just an example). I'm sure there's a way to do all this in a shell script, too.

schnaader 2009-03-07 11:56:13

Thank you! I observed that Mac does not have the option -w. I think that the reason is that they do not want many commands have same features. How can you parse the name out without -w option?

Masi 2009-03-07 12:15:33

Just forgot to add -w32 ;-)

Zsolt Botykai 2009-03-07 12:28:17

By the way, before anyone tries to bruteforce the md5sum above, it's for a file that contains "abc" ;)

schnaader 2009-03-07 12:33:53

Thank you! I love Awk :)

Masi 2009-03-07 13:27:31

Answer 3

+1 A:

This lists all the files, putting a blank line between duplicates:
$ md5sum *.txt | sort | perl -pe '($y)=split; print "\n" unless $y eq $x; $x=$y'

05aa3dad11b2d97568bc506a7080d4a3 b.txt

2a517c8a78f1e1582b4ce25e6a8e4953 n.txt

e1254aebddc54f1cbc9ed2eacce91f28 a.txt
e1254aebddc54f1cbc9ed2eacce91f28 k.txt
e1254aebddc54f1cbc9ed2eacce91f28 p.txt
$

To only print 1st of each group:
$ md5sum *.txt | sort | perl -ne '($y,$f)=split; print "$f\n" unless $y eq $x; $x=$y'
b.txt
n.txt
a.txt
$

if you're brave, change the "unless" to "if" and then

$ rm md5sum ...

to delete all but the first of each group

hornlo 2009-03-07 13:04:26

there should be backticks around the "md5sum ..." in the "rm" command line, but I can't edit again

hornlo 2009-03-07 13:12:38

ansaurus

tags:

views:

answers:

Checking duplicates in terminal?

related questions