views:

306

answers:

5

I wrote multiple scripts in Perl and shell and I have compared the real execution time. In all the cases, the Perl script was more than 10 times faster than the shell script.

So I wondered if it possible to write a shell script which is faster to the same script in Perl? And why is Perl faster than shell although I use the system function in Perl script?

+1  A: 

No, I think it is impossible:
bash command is truly interpeted language, but Perl programs are compiled to bytecode before execution

oraz
Ok - so why people use Shell script? because they don't know perl? :)
JohnJohnGa
Because it's natural to use command shell to execute commands :)
oraz
@JohnJohnGa: Because person-time is more valuable than CPU-time. If it takes even 5 minutes longer to write a Perl version that will save a few seconds per run, then bash is probably the better choice. Personally, I'm a huge Perl fan, but, when I'm just going to automate a series of commands with no need for flow control, I'll still do it with bash instead.
Dave Sherohman
Impossible is a strong word. I might say "improbable", but I bet someone can find at least on program that's speedy in bash. Perl can suffer from a compile-time disadvantage where it has to find, load, and compile code. I use shell quite a bit to sequence a series of command lines. I don't think Perl is going to make that task any faster, or give me results more quickly.
brian d foy
A: 

Yes. C code is going to be faster than Perl code for the same thing, so a script which uses a compiled executable for doing a lot of work is going to be faster than a perl program doing the same thing.

Of course, the Perl program could be rewritten to use the executable, in which case it would probably be faster again.

gorilla
That might be true in some cases, but remember that Perl is C code. Depending on the job, you might not be able to beat the highly optimized C infrastructure of Perl.
brian d foy
+3  A: 

This might fall dangerously close to arm-chair optimization, but here are some ideas that might rationalize your results:

  • Fork/exec: almost anything useful that is done by a shell script is done via a shell-out, that is starting a new shell and running the a command such as sed, awk, cat etc. More often then not, more then one process is executed, and data is moved via pipes.

  • Data structures: Perl's data structures are more sophisticated then Bash's or Csh's. This typically force the programmer to be created with data storage. This can take the forms of:

    • use non optimal data structures (arrays instead of hashes)
    • store data in textual form (for example integers as strings) that needed to be reinterpreted every time.
    • save data in a file, and re-parse it again and again.
    • etc.
  • Non optimized implementation: some shell construct might not be designed with optimization in mind, but with user convenience. For example, I have reason to believe that the bash implementation of Parameter Expansion in particular ${foo//search/replace} is sub-optimal relative to the same operation in sed. This is typically not a problem for day-to-day tasks.

Chen Levy
+3  A: 

There are few ways to make your shell (eg Bash) execute faster.

  1. Try to use less of external commands if Bash's internals can do the task for you. Eg, excessive use of sed , grep, awk et for string/text manipulation.
  2. If you are manipulating relatively BIG files, don't use bash's while read loop. Use awk. If you are manipulating really BIG files, you can use grep to search for the patterns you want, and then pass them to awk to "edit". grep's searching algorithm is very good and fast. If you want to get only front or end of the file, use head and tail.
  3. file manipulation tools such as sed, cut, grep, wc, etc all can be done with one awk script or using Bash internals if not complicated. Therefore, you can try to cut down the use of these tools that overlap in their functions. Unix pipes/chaining is excellent, but using too many of them, eg command|grep|grep|cut|sed makes your code slow. Each pipe is an overhead. For this example, just one awk does them all. command | awk '{do everything here}' The closest tool you can use which can match Perl's speed for certain tasks, eg string manipulation or maths, is awk. Here's a fun benchmark for this solution. There are around 9million numbers in the file

Output

$ head -5 file
1
2
3
34
42
$ wc -l <file
8999987

# time perl -nle '$sum += $_ } END { print $sum' file
290980117

real    0m13.532s
user    0m11.454s
sys     0m0.624s

$ time awk '{ sum += $1 } END { print sum }' file
290980117

real    0m9.271s
user    0m7.754s
sys     0m0.415s

$ time perl -nle '$sum += $_ } END { print $sum' file
290980117

real    0m13.158s
user    0m11.537s
sys     0m0.586s

$ time awk '{ sum += $1 } END { print sum }' file
290980117

real    0m9.028s
user    0m7.627s
sys     0m0.414s

For each try, awk is faster than Perl.

Lastly, try to learn awk beyond what they can do as one liners.

ghostdog74
Thanks! that's exactly what i am looking for!
JohnJohnGa
A: 

Certain shell commands can run faster than Perl, in some situations. I once benchmarked a simple sed script against the equivalent in perl, and sed won. But when the requirements became more complex, the perl version started beating the sed version. So the answer is, it depends. But for other reasons, (simplicity, maintainability, etc.) I'd lean toward doing things in Perl anyway unless the requirements are very simple, and I expect them to stay that way.

runrig