tags:

views:

299

answers:

5

I am definitely new to Perl, and please forgive me if this seem like a stupid question to you.

I am trying to unzip a bunch of .cab file with jzip in Perl (ActivePerl, jzip, Windows XP):

#!/usr/bin/perl

use strict;
use warnings;

use File::Find;
use IO::File;

use v5.10;

my $prefix = 'myfileprefix';
my $dir = '.';

File::Find::find(
    sub {
        my $file = $_;
        return if -d $file;          
        return if $file !~ /^$prefix(.*)\.cab$/;  
  my $cmd = 'jzip -eo '.$file;
  system($cmd);
    }, $dir
);

The code decompresses the first .cab files in the folder and hangs (without any errors). It hangs in there until I press Ctrl+c to stop. Anyone know what the problem is?

EDIT: I used processxp to inspect the processes, and I found that there are correct number of jzip processes fired up (per the number of cab files resides at the source folder). However, only one of them is run under cmd.exe => perl, and none of these process gets shut down after fired. Seems to me I need to shut down the process and execute it one by one, which I have no clue how to do so in perl. Any pointers?

EDIT: I also tried replacing jzip with notepad, it turns out it opens up notepad with one file at a time (in sequential order), and only if I manually close notepad then another instance is fired. Is this common behavior in ActivePerl?

EDIT: I finally solved it, and I am still not entire sure why. What I did was removing XML library in the script, which should not relevant. Sorry I removed "use XML::DOM" purposefully in the beginning as I thought it is completely irrelevant to this problem. OLD: use strict; use warnings;

use File::Find;
use IO::File;
use File::Copy;
use XML::DOM; 

use DBI;
use v5.10;

NEW:

#!/usr/bin/perl
use strict;
use warnings;

use File::Find;
use IO::File;
use File::Copy;

use DBI;
use v5.10;

my $prefix = 'myfileprefix';
my $dir = '.';

# retrieve xml file within given folder
File::Find::find(
    sub {
        my $file = $_;
        return if -d $file;             
        return if $file !~ /^$prefix(.*)\.cab$/;
        say $file;
        #say $file or die $!;
        my $cmd = 'jzip -eo '.$file;
        say $cmd;
        system($cmd);       
    }, $dir
);

This, however, imposes another problem, when the extracted file already exists, the script will hang again. I highly suspect this is a problem of jzip and an alternative of solving the problem is simply replacing jzip with extract, like @ghostdog74 pointed out below.

+1  A: 

First off, if you are using commands via system() call, you should always redirect their output/error to a log or at least process within your program.

In this particular case, if you do that, you'd have a log of what every single command is doing and will see if/when any of them are stuck.

Second, just a general tip, it's a good idea to always use native Perl libraries - in this case, it may be impossible of course (I'm not that experienced with Windows Perl so no clue if there's a jzip module in Perl, but search CPAN).

UPDATE: Didn't find a Perl native CAB extractor, but found a jzip replacement that might work better - worth a try. http://www.cabextract.org.uk/ - there's a DOS version which will hopefully work on Windows

DVK
Thanks for the tips, and I definitely agreed with you that I should output the output/error to a log. I did that in this case and the output file contains nothing, which means it is good. However, I still experience the same problem and the log is not helpful. Regarding your second point, yes, I also would love to use native Perl modules to do the decompression, but I am unable to locate any lib to unzip .cab (was generated by mssql makecab function). Let me know if you know any alternatives rather than jzip.
John
John - see the update.
DVK
@DVK: Thanks, I identified the real problem is I need to (programmatically) kill the running process before firing up another one. I give cabextx a shot, but it looks like its binary is no longer available, http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/util/user/cabextx.zip.
John
@John - this is strange. Are you sure the problem is with needing to kill (as opposed to parallel-forking multiple ones before waiting for each one to die)?Also, I'm unsure about Windows, but on Unix you need to harvest the child PIDs when forking - see Perl Cookbook for exact code.
DVK
@DVK - this is indeed very strange. I know Linux well and have no clue what happened on Windows, I thought the identified problem was the real cause as I saw many jzip instances (even notepad, if I switch to jzip to notepad) running even I killed the perl process. Paul Nathan proposed some code to force the script to run one process at a time, and still failed. I am seriously lost.
John
A: 

Based on your edit, this is what I suggest:

#!/usr/bin/perl

use strict;
use warnings;

use File::Find;
use IO::File;

use v5.10;

my $prefix = 'myfileprefix';
my $dir = '.';

my @commands;

File::Find::find(
    sub {
        my $file = $_;
        return if -d $file;             
        return if $file !~ /^$prefix(.*)\.cab$/;        
        my $cmd = "jzip -eo $File::Find::name";
        push @commands, $cmd;
    }, $dir
);


#asynchronously kick off jzips
my $fresult;
for @commands
 {
    $fresult = fork();
    if($fresult == 0) #child
    {
     `$_`;
    }
    elsif(! defined($fresult))
    {
       die("Fork failed");
    }
    else
    {
       #no-op, just keep moving
    }

 }

edit: added asynch. edit2: fixed scope issue.

Paul Nathan
@Paul thanks, I ran your script and jzip ran twice (successfully unzipped two cab files) then hang, again. I double checked the array and all commands are formatted correctly. Inspecting the running process, only one jzip is run. But when I stop the script, perl script is killed and jzip is still alive. Looks like I have to kill it before proceeding, how do I do that?
John
I ran your script again and it shows syntactical error that "my" variable $fresult masks earlier declaration in same scope
John
@I marked this as my solution as it gives me inspiration of how to fix the problem. I have posted my fix along with the real problem in the edits.
John
A: 

What happens when you run the jzip command from the dos window? Does it work correctly? What happens if you add an end of line character (\n) to the command in the script? Does this prevent the hang?

David Harris
@David: When running jzip from dos, everything works correctly. Adding \n to the end actually fires up more jzip instances, but still the script still hangs and only one or sometimes more (just re-running the script) cab files get decompressed.
John
@John: Are you by any chance using Cygwin?
David Harris
@David - I think he says he is using Windows XP.
Jay Zeng
@Jay: The first line of the OP's code is "#!/usr/bin/perl" which is the last thing I would expect on a windows system. I suspect the OP may be using a Unix emulation layer like Cygwin, MKS or SFU. That's why I asked the question.
David Harris
@David - Jay's right, I am using XP and the reason I put #!/usr/bin/perl is to ensure my script can run on *nix. For ActivePerl, it does not require path declaration because it is wrapped in the env variable.
John
A: 

here's an alternative, using extract.exe which you can download here or here

use File::Find;
use IO::File;    
use v5.10;    
my $prefix = 'myfileprefix';
my $dir = '.';    
File::Find::find({wanted => \&wanted}, '.');
exit;
sub wanted {
    my $destination = q(c:\test\temp);
    if ( -f $_ && $_=~/^$prefix(.*)\.cab$/ ) {
         $filename = "$File::Find::name";
         $path = "$File::Find::dir";
         $cmd = "extract /Y $path\\$filename /E /L $destination";
         print $cmd."\n";
         system($cmd);
    }
} $dir;
ghostdog74
@ghostdog - Thanks for pointer, replacing extract with jzip does work. But that is not the real solution for the problem (jzip brings up a GUI and doesn't how to shutdown.)
John
A: 

Although no one has mentioned it explicitly, system blocks until the process finishes. The real problem, as people have noted, is figuring out why the process doesn't exit. Forking or any other parallelization won't help because you'll be left with a lot of hung processes.

Until you can figure out the issue, start small. Make the smallest Perl script that demonstrates the problem:

#!perl
system( '/path/to/jzip',  '-eo', 'literal_file_name' ); # full path, list syntax!
print "I finished!\n";

Now the trick is to figure out why it hangs, and sometimes that means different solutions for different external programs. Sometimes you need to close STDIN before you run the external process or it sits there waiting for it to close, sometimes you do some other thing.

Instead of system, you might also try things such as IPC::System::Simple, which handles a lot of platform-specific details for you, or modules like IPC::Run or IPC::Open3.

Sometimes it just sucks, and this situation is one of those times.

brian d foy
@brian - thanks for your advice. I followed your idea and called system( '/path/to/jzip', '-eo', 'literal_file_name' ); 21 times, it hangs after decompressing 2 files. I will look into IPC::System::Simple in a minute.
John
Can you clarify that a bit: does it hang after two system calls, or does it hang during one system call after it extracts only two files from the archive? Not that I could help if I knew it either way. :)
brian d foy
@brian - I solved the problem and posted an update. It hung after one system call (sometimes 2 or 3, depends on luck. Yes, I know it is weird), each system call extract one input file. When leveraging jzip, it turned out the problem was I included use XML::DOM and works magically after removing it. I am absolutely lost, but it works.
John