views:

415

answers:

4

In a script you must include a #! on the first line followed by the path to the program that will execute the script (e.g.: sh, perl). As far as I know though, the # character denotes the start of a comment and that line is supposed to be ignored by the program executing the script. It would seem though, that this first line is at some point read by something in order for the script to be executed by the proper program. Could somebody please shed more light on the workings of the #! ?

Edit: I'm really curious about this, so the more in-depth the answer the better.

+4  A: 

It is a long story you can read about it on wikipedia.

mathk
+5  A: 

Short story: The shebang (#!) line is read by the shell (e.g. sh, bash, etc.) the operating system's program loader. While it formally looks like a comment, the fact that it's the very first two bytes of a file marks the whole file as a text file and as a script. The script will be passed to the executable mentioned on the first line after the shebang. Voilà!


Slightly longer story: Imagine you have your script, foo.sh, with the executable bit (x) set. This file contains e.g. the following:

#!/bin/sh

# some script commands follow...:
# *snip*

Now, on your shell, you type:

> ./foo.sh

Edit: Please also read the comments below after or before you read the following! As it turns out, I was mistaken. It's apparently not the shell that passes the script to the target interpreter, but the operating system (kernel) itself.

Remember that you type this inside the shell process (let's assume this is the program /bin/sh). Therefore, that input will have to be processed by that program. It interprets this line as a command, since it discovers that the very first thing entered on the line is the name of a file that actually exists and which has the executable bit(s) set.

/bin/sh then starts reading the file's contents and discovers the shebang (#!) right at the very beginning of the file. To the shell, this is a token ("magic number") by which it knows that the file contains a script.

Now, how does it know which programming language the script is written it? After all, you can execute Bash scripts, Perl scripts, Python scripts, ... All the shell knows so far is that it is looking at a script file (which is not a binary file, but a text file). Thus it reads the next input up to the first line break (which will result in /bin/sh, compare with the above). This is the interpreter to which the script will be passed for execution. (In this particular case, the target interpreter is the shell itself, so it doesn't have to invoke a new shell for the script; it simply processes the rest of the script file itself.)

If the script was destined for e.g. /bin/perl, all that the Perl interpreter would (optionally) have to do is look whether the shebang line really mentions the Perl interpreter. If not, the Perl interpreter would know that it cannot execute this script. If indeed the Perl interpreter is mentioned in the shebang line, it reads the rest of the script file and executes it.

stakx
The first two bytes of an executable are the magic number that indicates how it should be executed; for interpreted scripts, the first two bytes conveniently correspond to the ASCII chars `#!`
friedo
It's not the shell that's looking at those two bytes, it's the system (program loader), yes? The same thing happens whether you're running the script from within a shell or not.
Jefromi
The shebang is not handled by the shell, it's handled by the OS itself.
R Samuel Klatchko
Thanks for the corrections, I actually didn't know that. I have edited my answer accordingly. I decided to not delete my answer because I feel it can still help to understand what has to go on until a script ends up with the right interpreter; whether the necessary steps are taken by the shell or by the kernel itself appears to be only secondary to understanding.
stakx
AFAIK The first os to adopt this was 4BSD
mathk
+12  A: 

Let's say we have this file xxx.pl :

#!/usr/bin/perl
use strict;
# this is a comment line in perl
print("hi!\n");
exit(0);

If you invoke $ perl xxx.pl , the interpreter (perl) will load the file and interpret it as a source file. The first line will be for him just a comment, to be ignored, the same as the third line.

What happens instead if xxx.pl is made executable and invoked directly : $ ./xxx.pl ? Here, it's the shell (eg. /bin/bash ) the program that loads the file and tries to execute it. For this, he reads the first few bytes (remember, he still doesn't know if xxx.pl is a perl source, a python source, a shell script, or a binary executable or what), he finds the magic #! bytes and then says to himself :

"Aha! This is the famous shebang! Then this file must be, not a binary executable, but some textual script or source that I must invoke via some program. Which program? Let's read the rest of a first line (until \n) and find out. Ah, /usr/bin/perl eh? Ok, I'll call that executable and pass this 'xxx.pl' file as argument.

He calls then /usr/bin/perl xxx.pl. And there you have.

Update: As Kevin correctly points out, the intelligence to process the shebang goes actually below the shell, inside the process loader of the kernel. Indeed, if you invoke a command from C program through one of the exec() functions family (no shell involved) the shebang will also be processed.

leonbloy
It's not the shell that's looking at those two bytes, it's the OS kernel (program loader).
Kevin Panko
+1, nice explanation. Also note that this process is exactly analogous to Windows using the file's extension and/or file type to look up in the registry what executable to invoke to process the file.
Ether