Hi all,
I'm having a problem with processing a largeish file in Python. All I'm doing is
f = gzip.open(pathToLog, 'r')
for line in f:
counter = counter + 1
if (counter % 1000000 == 0):
print counter
f.close
This takes around 10m25s just to open the file, read the lines and increment this counter.
In perl, dealing with the same file and doing quite a bit more (some regular expression stuff), the whole process takes around 1m17s.
Perl Code:
open(LOG, "/bin/zcat $logfile |") or die "Cannot read $logfile: $!\n";
while (<LOG>) {
if (m/.*\[svc-\w+\].*login result: Successful\.$/) {
$_ =~ s/some regex here/$1,$2,$3,$4/;
push @an_array, $_
}
}
close LOG;
Can anyone advise what I can do to make the Python solution run at a similar speed to the Perl solution?
EDIT I've tried just uncompressing the file and dealing with it using open instead of gzip.open, but that only changes the total time to around 4m14.972s, which is still too slow.
I also removed the modulo and print statements and replaced them with pass, so all that is being done now is moving from file to file.