tags:

views:

621

answers:

4

Hi there, I'm doing some testing to ensure that the all in one zip file that i created using a script file will produce the same output as the content of a few zip files that i must manually click and create via web interface. Therefore the zip will have different folder structure.

Of course i can manually extracted them out and using my powerful eyeball technique to scan them or even lazier i can write a script to do that, but before i invest more time and get accused by my boss for company time robbery, i'm asking if there's a better way to do this?

I'm using perl LAMP stack by the way. thanks.

+1  A: 

I can wholeheartly recommend Beyond Compare. Unless you're really getting underpaid, it's the biggest bang for your (bosses) buck.

[Edit] I seem to have scanned over the different folder structure, sorry about that.Beyond Compare can compare all files in folders with the same folderstructure. It does not have (I believe) the intelligence to go searching for matches in files in different folders.

Regards,
Lieven

Lieven
@Lieven Does it do archive comparison? and how do i link it up to my perl script? thanks.
melaos
It does archive comparison. You can drive BC from the command line. I assume that will be doable in perl (don't know perl). The problem will be your different folder structure...
Lieven
@Lieven, yea i think the different folder structure is the killer here :(
melaos
@melaos I believe that flattening the hierarchy as SDX2000 mentioned is the best way to go then.
Lieven
+1  A: 

Taking a cue from Carra's answer...if A.zip is your single big archive and B.zip is the archive generated through the web then use the following algorithm

  1. Extract all files from A.zip and recursively (w.r.t folders) compute the checksum of the files present in the folder (using cksum, md5sum etc) where the contents were extracted and save this information after sorting it (pipe it through sort) to a file (say A.txt)

  2. Do the same for B.zip and generate B.txt

  3. Compare A.txt with B.txt they should be exactly the same.

OR

Use unzip -l to get file/directory lists for both the (zip) archives and then flatten the hierarchy of the user generated zip file and compare with the contents of your script generated zip file using some thing like diff. By flattening of hierarchy I mean you may need to do some kind of pre-precessing on one or both lists before you can do a meaningful comparison with diff.

SDX2000
@SDX2K yea i thought about that too, but was looking for some simple hack before i write my own. thanks :)
melaos
You are welcome :)
SDX2000
+1  A: 

Create a crc checksum for your files.

If your checksum is the same for the original files and the unzipped files, you can be sure the files are the same. And even works for non text data.

A checksum be easily be created with an external program such as "SFV Checker" or programmatically (.net/java for example include libraries to do this).

Carra
@Carra so in my case let's say there's three original zip files and now using my script i have one big zip files. How do i do it using checksum? thanks
melaos
@melaos I think he meant... you need to extract all your constituent files and then do a check sum on them may be based on file names or without them.
SDX2000
on linux you may try `cksum` or `md5sum` to generate checksums
SDX2000
but you should be able to get the checksum using `unzip -l` too (I think or may be some other switch)
SDX2000
@SDX2K well if i had to extract them out, it means i need to loop through each dir and each file right to compare them one by one? thanks.
melaos
I have updated my answer. Please see that.
SDX2000
Well, extract all files from your zip file into *one folder*. Extract the three small zip files and put all files into another folder. Now create a checksum of all your files in the first and the checksum of the second dir. If the checksum match, your files are the same.
Carra
+2  A: 

You can use perl's Archive::ZIP or Python's zipfile to extract the filenames, sizes and CRC checksums of the files in the archives. Create a file which contains the results sorted by file name (ignore the path).

For your smaller ZIPs, merge the results of the script (cat list1 list2 list3 | sort).

Now, you can use diff to compare the results.

Aaron Digulla
http://search.cpan.org/perldoc?Archive::Zip
Brad Gilbert