tags:

views:

308

answers:

6

It is generally advised not to use additional linux tools in a Perl code; e.g if someone intends to print the last line of a text file he can:

$last_line = `tail -1 $file` ;

or otherwise, open the file and read it line by line

 open(INFO,$file);
 while(<INFO>) {
   $last_line = $_ if eof;
   }

What are the pitfalls of using the previous and why should I avoid using shell tools in my code?

thanx,

+9  A: 

One reason is that your Perl code might be running in an environment where there is no shell tool called 'tail'.

It's a personal call depending on the project:

  • Is it going to be always used in shell environments with tail?
  • Do you care about only using pure Perl code?
thedz
+10  A: 

One of the primary reasons (besides portability) for not executing shell commands is that it introduces overhead by spawning another process. That's why much of the same functionality is available via CPAN in Perl modules.

ennuikiller
Processes are super cheap. This is not the reason.
jrockway
Yeah that is until you run out of them!!
ennuikiller
@jrockway I wouldn't say running external commands "super cheap": http://gist.github.com/155862 The Perl version is two orders of magnitude faster than qx/cat $filename/.
Chas. Owens
+13  A: 

It's better to keep all the action in Perl because it's faster and because it's more secure. It's faster because you're not spawning a new process, and it's more secure because you don't have to worry about shell meta character trickery.

For example, in your first case if $file contained "afilename ; rm -rf ~" you would be a very unhappy camper.

P.S. The best all-Perlway to do the tail is to use File::ReadBackwards

Ry4an
+22  A: 
  • Efficiency - you don't have to spawn a new process
  • Portability - you don't have to worry about an executable not existing, accepting different switches, or having different output
  • Ease of use - you don't have to parse the output, the results are already in a usable form
  • Error handling - you have finer-grained control over errors and what to do about them in Perl.
Chas. Owens
I would add to portability that you also don't have to worry about another machine having an executable of the same name, but with completely different flags for options.
Telemachus
This also true for ease of use. Many commands like ps have different output on different OSes.
Chas. Owens
Agree with @Telemachus, except same machine with a tool upgrade -- think coreutils, when head went from "head -100" to "head -n 100".
Andrew Barnett
+4  A: 

Using tail? Fine. But that's really a special case, since it's so easy to use and since it is so trivial.

The problem in general is not really efficiency or portability, that is largely irrelevant; the issue is ease of use. To run an external utility, you have to find out what arguments it accepts, write code to transform your program's data structures to that format, quote them properly, build the command line, and run the application. Then, you might have to feed it data and read data from it (involving complexity like an event loop, worrying about deadlocking, etc.), and finally interpret the return value. (UNIX processes consider "0" true and anything else false, but Perl assumes the opposite. foo() and die is hard to read.) This is a lot of work to do, and that's why people avoid it. It's much easier to create an instance of a class and call methods on it to get the data you need.

(You can abstract away processes this way; see Crypt::GpgME for example. It handles the complexity associated with invoking gpg, which would normally involve creating multiple filehandles other than STDOUT, STDIN, and STDERR, among other things.)

jrockway
I can understand if you figure, "Hey it's 2009 - my machine can handle extra processes." So, as far as efficiency goes, granted. But portability really does strike me as important. I regularly use both Linux (Debian) and OS X. I have learned _never_ to assume that the tool I want is available on both machines (e.g., no `wget` on OS X out of the box) or that the damn thing takes the same flags (e.g., BSD versions of many core utilities on OS X). I don't do Windows, but obviously it only gets more complicated if you do. So I think it's a bit flip to dismiss portability.
Telemachus
+3  A: 

The main reason I see for doing it all in Perl would be for robustness. Your use of tail will fail if the filename has shell metacharacters or spaces or doesn't exist or isn't accessible. From Perl, characters in the filename aren't an issue, and you can distinguish between errors in accessing the file. Sometimes being robust is more important than speedy coding and sometimes it's not.

ysth