views:

77

answers:

3

I have directoryA that gets populated as a replica of directoryB, and some files are changed or added. I want to automate the process of deleting all files from directoryA that have redundant copies in directoryB.

Both directories have several layers of sub-directories, so the solution will likely have to be recursive.

My first thought is to create a batch script, but I'm new to the microsoft command prompt, and it seems to be widely different from bash scripting, with which I have some limited experience.

I am using Windows XP, but would like a solution that also worked on Windows 7.

+1  A: 

I make a wide berth around Windows, but you'll likely find the powerful scripting capabilities you're looking for in Windows PowerShell (see also Microsoft's documentation).

PowerShell takes an object-oriented approach to entities in the file system and elsewhere. It should be easy to whip up a script to do what you need, but you'd need to learn PowerShell first, of course.


EDIT: Microsoft is offering a download for PowerShell for Windows XP and a few others, but I don't see one for Windows 7. Ah... Wikipedia says it's already integrated in Windows 7. So that should cover your requirements, and it's already on-board with the newest versions of Windows.

Carl Smotricz
For someone who has no experience using either, this is pretty much unhelpful. You could also recomment VBScript for that, by the way, which has the advantage of not having to install anything on a legacy Windows system.
Joey
Thanks to my answer, the OP now has more information and options than before. If he managed to learn bash, he should have no trouble with PowerShell either; and this is Microsoft's official "forward looking" product in terms of command shell, so I'm betting it will be a better future investment than VBScript.
Carl Smotricz
After spending some time poking around with PowerShell commands, I think it'd be rather simple to put together a few lines to accomplish this task. I haven't actually implemented anything, but it seems to be promising.I'm looking at automating other various processes as well. I'm not sure how rich PowerShell's command line is, but I know how powerful bash can be. I think I might end up finding an actual unix-like environment and just bash scripting this.
T.K.
No personal experience, but I have heard credible claims that PowerShell is much more powerful than bash. As bash gives you control of and access to all the programs in /bin and /usr/bin, PowerShell gives you full programmatic interface to every DLL that has a COM interface - that's basically the underlying building blocks of Windows/.NET . Unix shell scripts communicate only through text streams stdin/stdout; PowerShell accesses object properties and methods. Also, PowerShell is a modern and complete programming language, with somewhat more consistency and less idiosyncrasy than bash.
Carl Smotricz
But after that enthusiastic testimonial for PowerShell: If you are in control of the environment (and will continue to be) and you have a good working knowledge of bash and friends, then porting a Unix-like solution may be your easiest bet. Whenever I need it, some digging finds me a fairly complete set of Unix utilities based on mingw, including bash. Those programs do a pretty decent job of simulating a Unix environment under Windows. Recommended.
Carl Smotricz
+2  A: 

In your situation I would take the lazy man's way out, install mingw, and use

find directoryA directoryB -type f -exec md5sum '{}' ';' |
my-bash-script

to find every file in directoryA that has the same MD5 signature as a file in directoryB, then remove it.

Or if you prefer a less lazy solution but one that does not require mingw, install Lua and the Lua POSIX library (which I think can be installed on Windows). You can google for the MD5 library and do the entire operation in Lua, and it will be portable. And unlike the mingw solution, it will be easy to deploy to anybody's Windows box; you can make a standalone binary.

Norman Ramsey
I've been thinking of putting a unix-like environment onto this machine for a little while now. It hasn't really seemed like a worthwhile use of time until now. Thanks for mentioning mingw - I knew about cygwin, but it's good to see there are more options. I'll do some reading up on the advantages/disadvantages of each.I haven't heard of Lua before. I really like that it'd be a portable solution, but I'm unsure of its learning curve would make it feasible for my current task. Is the language difficult to pick up?
T.K.
lua is a delightfully simple language; its only data structure is the associative array, its definitive text is only 300 pages including the library specification. A slightly dated version of that book is available free-as-in-beer online too. But then you could just as easily install Gnu Awk for Windows, or Perl, or Ruby, or... well face it, the task is not that hard, and you can implement it in most any language you know and like.
Carl Smotricz
Norman Ramsey
+1  A: 

If you want a solution that does not require third-party software to be installed, use the script below. It only uses built-in command-line tools.

The script first checks some common error condition. Then it iterates recursively thru all the files in the cleanup directory. If it finds an equally named filed in the backup directory it does a binary comparison to determine if the file is redundant.

@echo off
rem delete files from a directory that have a redundant copy in a backup directory

setlocal enabledelayedexpansion

rem check arguments
if "%~2"=="" (
    echo.Usage: %~n0 cleanup_dir backup_dir
    echo.Delete files from cleanup_dir that have a redundant copy in backup_dir
    exit /b 1
)

set CLEANUP_DIR=%~f1
if not exist "%CLEANUP_DIR%" (
    echo."%CLEANUP_DIR%" does not exist.
    exit /b 1
)

set BACKUP_DIR=%~f2
if not exist "%BACKUP_DIR%" (
    echo."%BACKUP_DIR%" does not exist.
    exit /b 1
)

rem ensure that dirs are different
if "%CLEANUP_DIR%" == "%BACKUP_DIR%" (
    echo.backup directory must not be the same as cleanup directory.
    exit /b 1
)

rem ensure that backup_dir is not a sub dir of cleanup_dir
if not "!BACKUP_DIR:%CLEANUP_DIR%=!" == "%BACKUP_DIR%" (
    echo.backup directory must not be a sub directory of cleanup directory.
    exit /b 1
)

rem iterate recursively thru files in cleanup_dir
for /R "%CLEANUP_DIR%" %%F in (*) do (
    set FILE_PATH=%%F
    set BACKUP_FILE_PATH=!FILE_PATH:%CLEANUP_DIR%=%BACKUP_DIR%!
    if exist "!BACKUP_FILE_PATH!" (
        rem binary compare file to file in backup dir
        fc /B "!FILE_PATH!" "!BACKUP_FILE_PATH!" >NUL 2>&1
        if not errorlevel 1 (
            rem if files are identical delete file from cleanup_dir
            echo.delete redundant "!FILE_PATH!".
            del "!FILE_PATH!"
        ) else (
            echo.keep modified "!FILE_PATH!".
        )
    ) else (
        echo.keep added "!FILE_PATH!".
    )
)
sakra
Thanks! This is the kind of answer I was looking for when I wrote the question - I guessed it could be written in only built-in command line instructions, but I wasn't sure, and didn't want sit down with Microsoft command line only to learn it didn't have the capabilities I'd need. Thanks again!
T.K.