views:

128

answers:

2

Hello Guys,

I'm going to implement recursive folder comparison on python. What do you think would best algorithm for this?

  1. Get two lists of the files for the folders
  2. Sort both lists
  3. Compare using filecmp module for a file
  4. Repeat for every folder recursively

In result I need to get only the list of the files that are different (content difference is not needed here), the list of the files that are missing in one of the comparable folders.

Thank you.

+1  A: 

If I was you, I would check if another software doesn't already implement this feature, like rsync or diff. For what I see, both have the features you need. There are more information about it here.

If you really need to do this in Python, I would modify slightly your algorithm, making it look like this:

  1. Store both paths contents in two separate list variables using os.walk;
  2. Iterate on each value of the first list to find a corresponding value in the second list;
  3. If a corresponding value have been found, compare it using the filecmp module. Otherwise, display the missing file;
  4. Remove the value from the second list;
  5. Go to #2 until the first list is empty;
  6. Print everything left in the second list;
Soravux
Seems like you would gain some speed by using sets instead of lists.
intuited
A: 

Make recursive search over directory and for each file store md5 or sha checksum of file in dictionary as key and path/name as value. Make this dictionary for both directories. Then you can remove pairs from each directory and result is missing/different files.

This will make simple O(n) algorhitm, where n is volume of directory.

ralu