views:

423

answers:

3

I need to allow users to upload a zip file via a web form. The server is running Linux with an Apache web server. Are there advantages to using a module like Archive::Zip to extract this archive or should I just execute a system call to unzip with backticks?

+4  A: 

If you execute the binary unzip, your process will fork/exec and

  1. instantiate a new process
  2. consume more memory (for the duration of the spawned process)

You'll also have to configure with the correct path to unzip. Given all this, I would strongly prefer the library approach.

Brian Agnew
I never understand this kind of thinking. A fork/exec is a very fast process, anyone who's spent any time reading or writing shell scripts is aware of this. The memory consumed by the unzip binary itself is trivial compared to the costs of the algorithm and (especially) the data in it. And /usr/bin/unzip ships by default on every Linux distribution, and I believe BSDs and Cygwin too.Unless you have sample code as trivial as: `open my $input, "unzip -cp $ARCHIVE $FILE |"`, I'd strongly prefer the simple option.
Andy Ross
Whilst I agree with the above *generally*, if you have a heavily loaded server, then your resource consumption will increase with the fork/exec model. The pid allocation, the inter-process stream allocation, the memory allocation (allowing for copy-on-write). For stand-alone processes I'm happy with the fork/exec model. For server models I prefer to shy away from this towards the model with the least amount of resource allocation.
Brian Agnew
+8  A: 

According to the Archive::Zip documentation you'd be better off using Archive::Extract:

If you are just going to be extracting zips (and/or other archives) you are recommended to look at using Archive::Extract instead, as it is much easier to use and factors out archive-specific functionality.

That's interesting because Archive::Extract will try Archive::Zip first and then fall back to the unzip binary if it fails. So it does seem that Archive::Zip is the preferred option.

Archive::Zip uses Compress::Raw::Zlib which is a low level interface to the zlib system library; so it's not a pure Perl implementation meaning it's going to be similar in performance to unzip. So, in other words, from a performance perspective there's no reason to pick unzip ahead of Archive::Zip.

Dave Webb
If you use `Archive::Extract`, then it will also work for other compression formats.
Brad Gilbert
A: 

Once concern is with memory. We have found the hard way (production web server crashed) that Archive::Tar had a memory leak. So while overall using a module instead of a system call to an external command is a good idea (see other responses for reasoning), you need to make sure the module has no gotchas.

DVK