A presentation by Mikhael Goikhman from a 2003 Perl conference includes a pair of examples of prime-number-finding scripts. One is threaded, and the other is not. Upon running the scripts (print lines commented out), I got an execution time of 0.011s on the non-threaded one, and 2.343 (!) seconds on the threaded version. What accounts for the stunning difference in times?
I have some experience with threads in Perl and have noticed before that thread creation times can be particularly brutal, but this doesn't seem to be the bottleneck in Goikham's example.