ansaurus

Question

Detecting duplicate binaries in the same directory (Windows)

Answer 1

A:

Generate a hash (Md5 or sha1) of each file and compare.

Obviously if two files are a different size then you can discount it immediately.

Visage 2010-01-14 15:22:49

Answer 2

A:

You don't specify, how this should happen. Maybe this is a question which belongs to superuser.com, but you may use a tool like WinMerge.

If you have to do this by code, you could calculate a hash value of the files and compare this hash value.

Scoregraphic 2010-01-14 15:23:03

Answer 3

+3 A:

That's pretty easy. You can use two nested for loops on the commandline:

for %x in (*) do @(
    for %y in (*) do @(
        if not "%x"=="%y" @(
            fc /b "%x" "%y" >nul && echo "%x" and "%y" are equal
        )
    )
)

If you want to use this in a batch file, you need to double the % signs.

The code simply loops twice over all files in the current directory:

for %x in (*) do @(
    for %y in (*) do @(

then, if the two file names aren't equal (because then we know the files are equal)

        if not "%x"=="%y" @(

if runs the fc utility which compares files

            fc "%x" "%y" >nul && echo "%x" and "%y" are equal

If fc had an exit code of 0 it means that the files were equal (thus duplicates) and in that case the echo after the && is triggered. && means “Just execute the following command if the previous one exited with a 0 exit code”.

And for 30 files this is certainly fast enough. I once implemented something more elaborate in batch, but this should suffice.

ETA: Found the other batch; still nowhere publicly explained but I once posted it at Super User.

Joey 2010-01-14 15:27:05

Thanks! Would up if I could!

Activist 2010-01-14 15:40:06

I think you need fc /b since the files are binary.

Larry Osterman 2010-01-14 17:54:25

Ah, thanks Larry. My original batch had that already. However, `fc` seems to compare files without that switch just fine. There seems to be some auto-detection going on.

Joey 2010-01-15 00:15:48

Answer 4

+1 A:

Personally, I would sort the files by file size first. Files of different file size cannot the same from a binary comparison.

Those that are of the same file-size could potentially be the same, so I would then generate a hash of the files contents (either MD5, SHA1 etc.). Those files that have the same hash result are identical.

And to keep everything "on-topic" from a programming perspective (otherwise this question is perhaps more suited to superuser.com), here is a C# project that implements a "shell extension" (i.e. additional items in Windows Explorer's context menu) that will compute various hashes of files selected within Windows Explorer:

File Hash Generator Shell Extension

CraigTP 2010-01-14 15:29:15

Answer 5

+1 A:

Hash them with Md5Deep (or similar), or try a duplicate file checker,

http://www.portablefreeware.com/index.php?sc=77

goorj 2010-01-14 15:29:44

Thanks! Would up if I could!

Activist 2010-01-14 15:38:56

Answer 6

A:

you can use fc or fciv (for checksum)

Or you could download GNU utilities

get Textutils which contains md5sum and coreutils, which contains sort /uniq. then do this

C:\files>md5sum * | sort | uniq -d -w 32
6f2b448730d23fe68876db87f1ddc143 *file.txt

To iterate and do something to the results, use a for loop

ghostdog74 2010-01-14 15:36:21

ansaurus

tags:

views:

answers:

Detecting duplicate binaries in the same directory (Windows)

related questions