views:

62

answers:

6

Hello!

I have about 30 files in a directory varying from 64KB to 4MB that are BIN files. I need to find if there is duplicate files in there... Many files have the same size.

I would like to find if there are binary identical files in there.

Anyone know a way to do this? I'm under Windows XP Pro.

Thanks!

A: 

Generate a hash (Md5 or sha1) of each file and compare.

Obviously if two files are a different size then you can discount it immediately.

Visage
A: 

You don't specify, how this should happen. Maybe this is a question which belongs to superuser.com, but you may use a tool like WinMerge.

If you have to do this by code, you could calculate a hash value of the files and compare this hash value.

Scoregraphic
+3  A: 

That's pretty easy. You can use two nested for loops on the commandline:

for %x in (*) do @(
    for %y in (*) do @(
        if not "%x"=="%y" @(
            fc /b "%x" "%y" >nul && echo "%x" and "%y" are equal
        )
    )
)

If you want to use this in a batch file, you need to double the % signs.

The code simply loops twice over all files in the current directory:

for %x in (*) do @(
    for %y in (*) do @(

then, if the two file names aren't equal (because then we know the files are equal)

        if not "%x"=="%y" @(

if runs the fc utility which compares files

            fc "%x" "%y" >nul && echo "%x" and "%y" are equal

If fc had an exit code of 0 it means that the files were equal (thus duplicates) and in that case the echo after the && is triggered. && means “Just execute the following command if the previous one exited with a 0 exit code”.

And for 30 files this is certainly fast enough. I once implemented something more elaborate in batch, but this should suffice.

ETA: Found the other batch; still nowhere publicly explained but I once posted it at Super User.

Joey
Thanks! Would up if I could!
Activist
I think you need fc /b since the files are binary.
Larry Osterman
Ah, thanks Larry. My original batch had that already. However, `fc` seems to compare files without that switch just fine. There seems to be some auto-detection going on.
Joey
+1  A: 

Personally, I would sort the files by file size first. Files of different file size cannot the same from a binary comparison.

Those that are of the same file-size could potentially be the same, so I would then generate a hash of the files contents (either MD5, SHA1 etc.). Those files that have the same hash result are identical.

And to keep everything "on-topic" from a programming perspective (otherwise this question is perhaps more suited to superuser.com), here is a C# project that implements a "shell extension" (i.e. additional items in Windows Explorer's context menu) that will compute various hashes of files selected within Windows Explorer:

File Hash Generator Shell Extension

CraigTP
+1  A: 

Hash them with Md5Deep (or similar), or try a duplicate file checker,

http://www.portablefreeware.com/index.php?sc=77

goorj
Thanks! Would up if I could!
Activist
A: 

you can use fc or fciv (for checksum)

Or you could download GNU utilities

get Textutils which contains md5sum and coreutils, which contains sort /uniq. then do this

C:\files>md5sum * | sort | uniq -d -w 32
6f2b448730d23fe68876db87f1ddc143 *file.txt

To iterate and do something to the results, use a for loop

ghostdog74