



So I'm running perl 5.10 on a core 2 duo macbook pro compiled with threading support: usethreads=define, useithreads=define. I've got a simple script to read 4 gzipped files containing aroud 750000 lines each. I'm using Compress::Zlib to do the uncompressing and reading of the files. I've got 2 implementations the only difference between them being one includes use threads. Other than that both script run the same subroutine to do the reading. Hence in psuedocode the non-threading program does this:


The threaded version goes like this:

my thr0 = threads->new(\$read_gzipped,'file1')
my thr1 = threads->new(\$read_gzipped,'file1')
my thr2 = threads->new(\$read_gzipped,'file1')
my thr3 = threads->new(\$read_gzipped,'file1')


Now the threaded version is actually running almost 2 times slower then the non-threaded script. This obviously was not the result I was hoping for. Can anyone explain what I'm doing wrong here?

+9  A: 

My guess is the bottleneck for GZIP operations is disk access. If you have four threads competing for disk access on platter harddisk, that slows things down considerably. The disk head will have to move to different files in rapid succession. If you just process one file at a time, the head can stay near that file, and the disk cache will be more accurate.

thanks never even thought about the i/o contention, I was focusing more on the performance of the uncompressing alogrithm.
+9  A: 

You're using threads to try and speed up something that's IO-bound, not CPU-bound. That just introduces more IO contention, which slows down the script.

thanks never even thought about the i/o contention, I was focusing more on the performance of the uncompressing alogrithm
+3  A: 

ithreads work well if you're dealing with something which is mostly not cpu bound. decompression is cpu bound.

You can easily alleviate the problem with using Parallel::ForkManager module.

Generally - threads in Perl and not really good.


I'm not prepared to assume that you're I/O bound without seeing the output of top while this is running. Like depesz, I tend to assume that compression/decompression operations (which are math-heavy) are more likely to be CPU-bound.

When you're dealing with a CPU-bound operation, using more threads/processes than you have processors will almost never[1] improve matters - if the CPU utilization is already at 100%, more threads/processes won't magically increase its capacity - and will most likely make things worse by adding in more context-switching overhead.

[1] I've heard it suggested that heavy compilations, such as building a new kernel, benefit from telling make to use twice as many processes as the machine has processors and my personal experience has been that this seems to be accurate. The explanation I've heard for it is that this allows each CPU to be kept busy compiling in one process while the other process is waiting for data to be fetched from main memory. If you view compiling as a CPU-bound process, this is an exception to the normal rule. If you view it as an I/O bound case (where the I/O is between the CPU and main memory rather than disk/network/user I/O), it is not.

Dave Sherohman