So I recently wanted to thread one of my Perl programs to increase its speed. Taking in a list of websites, I wanted to start a thread for each url and get the content of each website and then look for a company description on the page. Once one thread found a result, or all thread's didn't, I wanted to exit, write my result, and read in urls for my next company.
The problem that I see is that I use the Perl::Unsafe::Signals module inside of the function that I call when creating a thread. I need the unsafe signals to interrupt regular expressions that get "stuck". However this seems to cause all sorts of problems, mainly having the program crash and the error msg "Alarm Clock" shown.
Therefore, is there a way to use Perl::Unsafe::Signals and threads safely? Is there a way to timeout a regular expression in another way by sending a signal to the function ( like I send a 'KILL' signal below?) Thanks.
Note: I stripped down the code to all pertinent parts, let me know if you need more.
use threads ('exit' => 'threads_only');
use threads::shared;
my @descrip;
share(@descrip);
my $lock;
share($lock);
URL:foreach my $url(@unique_urls) {
#skip blank urls
if(!$url) { next URL; }#if
#find description
my $thread = threads->create(\&findCompanyDescription, $PREV_COMPANY, $PREV_BASE_URL, $url);
#while a description has not been found and there are still active threads, keep looking
#there may be a better way to do this, but this seems to work for me
while(!@descrip && threads->list() != 0) {;}
#kill all threads, write output, read in next batch of urls
my @threads = threads->list();
foreach(@threads) { print("detaching\n"); $_->kill('KILL')->detach(); }#foreach
#######SUBROUTINE CALLED BY THREAD CREATE
sub findCompanyDescription {
my($company_full, $base_url, $url) = @_;
my($descrip, $raw_meta, $raw) = '';
my @company;
$SIG{'KILL'} = sub { alarm(0); threads->exit(); };
eval {
local $SIG{ALRM} = sub { die("alarm\n") }; # NB: \n required
alarm(5);
use Perl::Unsafe::Signals;
UNSAFE_SIGNALS {
while($company) {
my @matches = ($content =~ m!.*<([\w\d]+).*?>\s*about\s+$company[\w\s\-_]*<.*?>(?:<.*?>|\s)*(.*?)</\1.*?>!sig);
MATCH:for(my $ndx=1; $ndx<@matches; $ndx+=2) {
($raw, $descrip) = &filterResult($matches[$ndx], $company_full);
if($descrip) {
$company = undef;
last(MATCH);
}#if
}#for
#reduce the company name and try again
$company = &reduceCompanyName($company);
}#while
alarm(0);
};#unsafe_signals
};#eval
if($@) {
if($@ eq "alarm\n" && $DEBUG) { print("\nWebpage Timeout [].\n"); }#if
}#if
if($descrip) { lock($lock); {
@descrip = ($PREV_ID, $company_full, $base_url, $url, 1, $raw, $descrip); }
}#if