views:

72

answers:

5

Hi, can anyone please suggest a method (ruby, python or dos preferable) to remove only the different files and sub-folders between two given folders?

I need it to recurse through sub-directories and delete everything that is different.

I don't wanna have to install anything, so a script would be great.

Thanks in advance

A: 

You can use Python's difflib to tell what files differ, then os.unlink them. Really, if all you need is to tell if the files differ at all, you can just compare their text with:

for file1, file2 in files:
    f1 = open(file1, 'r').read()
    f1.close()
    f2 = open(file2, 'r').read()
    f2.close()
    if f1 != f2:
        os.unlink(file1)
        os.unlink(file2)

You can use os.walk to get lists of files. The above code is written without new things like with, since you didn't want to install things. If you have a new Python installation, you can make it a bit nicer.

Nathon
A: 

In Python, you can get the filenames using os.walk. Put each full pathname into a set and use the difference method to get the files and folders that are different.

Mark Ransom
A: 

Ruby

folder1=ARGV[0]
folder2=ARGV[1]
f1=Dir["#{folder1}/**"].inject([]){|r,f|r<<File.basename(f)}
Dir["#{folder2}/**"].each{|f2|File.unlink(f2) if not f1.include?(File.basename(f2))}
ghostdog74
A: 

This is the kind of thing I have done when I wanted to diff directories:

#!/usr/bin/env python

import os, os.path
import stat

def traverse_path(start_dir='.'):
    for root, dirs, files in os.walk(start_dir, topdown=False):
        for f in files:
            complete_path = os.path.join(root, f)
            try:
                m = os.stat(complete_path)[stat.ST_MODE]
                if stat.S_ISREG(m):
                    yield complete_path[len(start_dir):]
            except OSError, err:
                print 'Skipping', complete_path
            except IOError, err:
                print 'Skipping', complete_path

if __name__ == '__main__':
    s = set(traverse_path('/home/hughdbrown'))
    t = set(traverse_path('/home.backup/hughdbrown'))
    for e in s - t:
        print e
    print '-' * 25
    for e in t - s:
        print e

Notice that there is a check for regular files. I seem to recall that I encountered files used as semaphores or which were written to by one process and read by another or something. It turned out to be important.

You can add code to delete files, according to whatever rules you like.

hughdbrown
That did it. Thanks
Marcos Placona
+1  A: 

Wouldn't rsync be the better solution? It supports everything you want and does it fast.

luispedro