I want to parse through a 8 GB file to find some information. This is taking me more than 4 hours to finish. I gone through perl Parallel::ForkManager module for this. But it doesn't make much difference. What is the better way to implement this?
The following is the part of the code used to parse this Jumbo file. I actually have list of domains which I have to look in a 8 GB sized zone file and find out what company it is hosted with.
unless(open(FH, $file)) {
print $LOG "Can't open '$file' $!";
die "Can't open '$file' $!";
}
### Reading Zone file : $file
DOMAIN: while(my $line = <FH> ){
#domain and the dns with whom he currently hosted
my($domain, undef, $new_host) = split(/\s|\t/, $line);
next if $seen{$domain};
$seen{$domain} =1;
$domain.=".$domain_type";
$domain = lc ($domain);
#already in?
if($moved_domains->{$domain}){
#Get the next domain if this on the same host, there is nothing to record
if($new_host eq $moved_domains->{$domain}->{PointingHost}){
next DOMAIN;
}
#movedout
else{
@INSERTS = ($domain, $data_date, $new_host, $moved_domains->{$domain}->{Host});
log_this($data_date, $populate, @INSERTS);
}
delete $moved_domains->{$domain};
}
#new to MovedDomain
else{
#is this any of our interested HOSTS
my ($interested) = grep{$new_host =~/\b$_\b/i} keys %HOST;
#if not any of our interested DNS, NEXT!
next DOMAIN if not $interested;
@INSERTS = ($domain, $data_date, $new_host, $HOST{$interested});
log_this($data_date, $populate, @INSERTS);
}
next DOMAIN;
}