tags:

views:

315

answers:

3

While trying to do this:

 my $obj = new JavaScript::Minifier;
 $obj->minify(*STDIN, *STDOUT);
// modified above line to
 $obj->minify(*IP_HANDLE,*OP_HANDLE)

The above works if IP_HANDLE and OP_HANDLE are filehandles but still I am not able to figure out what actually the * does when applied to a filehandle or any other datatype.

Thanks,

+2  A: 

It's the glob sigil. *FOO refers to the glob named "FOO", just like $FOO refers to the scalar named "FOO", and so forth. Globs are usually code references or filehandles.

You need the sigil present in order to modify a glob value, eg *name_of_sub = sub{};, or to take its value without invoking special syntax, eg of calling a sub.

Anonymous
code references? not particularly. *glob = some_reference replaces that part of the glob only (code, array, hash, io, etc.).
ysth
I like to call them **typeglobs** since we have another sort of `glob` http://perldoc.perl.org/functions/glob.html . Calling both things *glob* was confusing when I was a newb.
daotoad
+6  A: 

The * refers to a Perl "typeglob", which is an obscure implementation detail of Perl. Some older Perl code needs to refer to file handles using typeglobs (since there wasn't any other way to do it at the time). More modern code can use filehandle references instead, which are easier to work with.

The * is analogous to $ or %, it refers to a different kind of object known by the same name.

From the perldata documentation page:

Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a typeglob is a * , because it represents all types. This used to be the preferred way to pass arrays and hashes by reference into a function, but now that we have real references, this is seldom needed.

Greg Hewgill
+12  A: 

Summary

  • Strings can name filehandles
  • Syntactic ambiguity
  • Typeglob assignment
  • Localizing typeglobs
  • *foo{THING} syntax
  • Tying it all together: DWIM!

Strings can name filehandles

Without the * sigil, a bareword is just a string.

Simple strings sometimes suffice, hower. For example, the print operator allows

$ perl -le 'print { "STDOUT" } "Hiya!"'
Hiya!

$ perl -le '$h="STDOUT"; print $h "Hiya!"'
Hiya!

$ perl -le 'print "STDOUT" +123'
123

These fail with strict 'refs' enabled. The manual explains:

FILEHANDLE may be a scalar variable name, in which case the variable contains the name of or a reference to the filehandle, thus introducing one level of indirection.

Syntactic ambiguity

In your example, consider the syntactic ambiguity. Without the * sigil, you could mean strings

$ perl -MO=Deparse,-p prog.pl
use JavaScript::Minifier;
(my $obj = 'JavaScript::Minifier'->new);
$obj->minify('IP_HANDLE', 'OP_HANDLE');

or maybe a sub call

$ perl -MO=Deparse,-p prog.pl
use JavaScript::Minifier;
sub OP_HANDLE {
    1;
}
(my $obj = 'JavaScript::Minifier'->new);
$obj->minify('IP_HANDLE', OP_HANDLE());

or, of course, a filehandle. Note in the examples above how the bareword JavaScript::Minifier also compiles as a simple string.

Enable the strict pragma and it all goes out the window anyway:

$ perl -Mstrict prog.pl
Bareword "IP_HANDLE" not allowed while "strict subs" in use at prog.pl line 6.
Bareword "OP_HANDLE" not allowed while "strict subs" in use at prog.pl line 6.

Typeglob assignment

One trick with typeglobs that's handy for Stack Overflow posts is

*ARGV = *DATA;

(I could be more precise with *ARGV = *DATA{IO}, but that's a little fussy.)

This allows the diamond operator <> to read from the DATA filehandle, as in

#! /usr/bin/perl

*ARGV = *DATA;

while (<>) { print }

__DATA__
Hello
there

This way, the program and its input can be in a single file, and the code is a closer match to how it will look in production: just delete the typeglob assignment.

Localizing typeglobs

You can use typeglobs to localize filehandles:

$ cat prog.pl
#! /usr/bin/perl

sub foo {
  local(*STDOUT);
  open STDOUT, ">", "/dev/null" or die "$0: open: $!";
  print "You can't see me!\n";
}

print "Hello\n";
foo;
print "Good bye.\n";

$ ./prog.pl
Hello
Good bye.

*foo{THING} syntax

You can get at the different parts of a typeglob, as perlref explains:

A reference can be created by using a special syntax, lovingly known as the *foo{THING} syntax. *foo{THING} returns a reference to the THING slot in *foo (which is the symbol table entry which holds everything known as foo).

$scalarref = *foo{SCALAR};
$arrayref = *ARGV{ARRAY};
$hashref = *ENV{HASH};
$coderef = *handler{CODE};
$ioref = *STDIN{IO};
$globref = *foo{GLOB};
$formatref = *foo{FORMAT};

All of these are self-explanatory except for *foo{IO}. It returns the IO handle, used for file handles (open), sockets (socket and socketpair), and directory handles (opendir). For compatibility with previous versions of Perl, *foo{FILEHANDLE} is a synonym for *foo{IO}, though it is deprecated as of 5.8.0. If deprecation warnings are in effect, it will warn of its use.

*foo{THING} returns undef if that particular THING hasn't been used yet, except in the case of scalars. *foo{SCALAR} returns a reference to an anonymous scalar if $foo hasn't been used yet. This might change in a future release.

*foo{IO} is an alternative to the *HANDLE mechanism given in ["Typeglobs and Filehandles" in perldata] for passing filehandles into or out of subroutines, or storing into larger data structures. Its disadvantage is that it won't create a new filehandle for you. Its advantage is that you have less risk of clobbering more than you want to with a typeglob assignment. (It still conflates file and directory handles, though.) However, if you assign the incoming value to a scalar instead of a typeglob as we do in the examples below, there's no risk of that happening.

splutter(*STDOUT); # pass the whole glob
splutter(*STDOUT{IO}); # pass both file and dir handles

sub splutter {
  my $fh = shift;
  print $fh "her um well a hmmm\n";
}

$rec = get_rec(*STDIN); # pass the whole glob
$rec = get_rec(*STDIN{IO}); # pass both file and dir handles

sub get_rec {
  my $fh = shift;
  return scalar <$fh>;
}

Tying it all together: DWIM!

Context is key with Perl. In your example, although the syntax may be ambiguous, the intent is not: even if the paramaters are strings, those strings clearly name filehandles.

So consider all the cases minify ought to handle:

  • bareword
  • bare typeglob
  • reference to typeglob
  • filehandle in a scalar

For example:

#! /usr/bin/perl

use warnings;
use strict;

*IP_HANDLE = *DATA;
open OP_HANDLE, ">&STDOUT";
open my $fh, ">&STDOUT";
my $offset = tell DATA;

use JavaScript::Minifier;
my $obj = JavaScript::Minifier->new;
$obj->minify(*IP_HANDLE, "OP_HANDLE");

seek DATA, $offset, 0 or die "$0: seek: $!";
$obj->minify(\*IP_HANDLE, $fh);

__DATA__
Ahoy there
matey!

The following stub of JavaScript::Minifier gets us there:

package JavaScript::Minifier;

use warnings;
use strict;

sub new { bless {} => shift }

sub minify {
  my($self,$in,$out) = @_;

  for ($in, $out) {
    no strict 'refs';
    next if ref($_) || ref(\$_) eq "GLOB";

    my $pkg = caller;
    $_ = *{ $pkg . "::" . $_ }{IO};
  }

  while (<$in>) { print $out $_ }
}

1;

Output:

$ ./prog.pl
Name "main::OP_HANDLE" used only once: possible typo at ./prog.pl line 7.
Ahoy there
matey!
Ahoy there
matey!
Greg Bacon
Nice thorough answer!
David Precious
Excellent--I wish I could double upvote. It's also worth noting that current thinking on best practices is to use lexical handles (`open my $foo, '<', $filename;`), which avoids the need for this stuff. It's outside the strict scope of the question, but typeglob assignment is also how you can install subroutines into a package. `\*foo = sub { print "Sub called" };`. You've produced a great resource on typeglobs. Thank you.
daotoad
@David and @daotoad Thank you for your kind words.
Greg Bacon