views:

3674

answers:

9

So I'm working on a server where I can't install any modules whatsoever. Yes this makes my job difficult.

To complete the rest of my script I need to do something that I had thought would be fairly straightforward but it seems to be anything but.

I'm trying to open an .html file as one big long string. This is what I've got:

open(FILE, 'index.html') or die "Can't read file 'filename' [$!]\n";  
$document = <FILE>; 
close (FILE);  
print $document;

which results in:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN

However, I want the result to look like:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"&gt;
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

This way I can search the entire document more easily.

Anyone know how to do this? Thanks.

+4  A: 
Yuriy Yashkir
+18  A: 

Add:

 local $/;

before reading from the file handle.

Incidentally, if you can put your script on the server, you can have all the modules you want. See the Perl FAQ list.

Sinan Ünür
You should probably explain what effects localizing $/ is going to do as well as what its purpose is.
Danny
Sinan Ünür
+10  A: 

A slew of good answers on slurping a file in Perl.

Telemachus
AND, the one with the worst score (plus alternate in its comment) was my favorite. Laziness, I believe, is sometimes a virtue. (e.g. backtick + cat or joining an array read) I tend to do a bit of "perl -e" stuff here and there, and like brevity at the expense of C/++ speed.
Roboprog
+12  A: 

I would do it like this:

my $file = "index.html";
my $document = do {
    local $/ = undef;
    open my $fh, "<", $file
        or die "could not open $file: $!";
    <$fh>;
};

Note the use of the three argument version of open, it is much safer than the old two (or one) argument versions. Also not the use of a lexical filehandle. Lexical filehandles are nicer than the old bareword variants, for many reasons. We are taking advantage of one of them here: they close when they go out of scope.

Chas. Owens
This is probably the best non-cpan'd way to do it as it uses both the 3 argument open as well as keeping the INPUT_RECORD_SEPARATOR ($/) variable localized to the smallest required context.
Danny
+9  A: 

All the posts are slightly non-idiomatic. The idiom is:

open my $fh, '<', $filename or die "error opening $filename: $!";
my $data = do { local $/; <$fh> };

Mostly, there is no need to set $/ to undef.

jrockway
`local $foo = undef` is just the Perl Best Practice (PBP) suggested method. If we are posting snippits of code I'd think doing our best to make it clear would be A Good Thing.
Danny
Showing people how to write non-idiomatic code is a good thing? If I saw "local $/ = undef" in code I was working on, my first action would be to publicly humiliate the author on irc. (And I am generally not picky about "style" issues.)
jrockway
Ok, I'll bite: what exactly is mock-worthy about "local $/ = undef"? If your only answer is "It's non-idiomatic," then (a) I'm not so sure and (b) so what? I'm not so sure, because it's awfully damn common as a way to do this. And so what because it's perfectly clear and reasonably brief. You may be more picky about style issues that you think.
Telemachus
The key is that the "local $/" is part of a well-known idiom. If you are writing some random code and write "local $Foo::Bar = undef;", that is fine. But in this very special case, you might as well speak the same language as everyone else, even if it's "less clear" (which I don't agree with; the behavior of "local" is well-defined in this respect).
jrockway
Sorry, disagree. It is much more common to be explicit when you want to change the actual behavior of a magic variable; it is a declaration of intent. Even the documentation uses 'local $/ = undef' (see http://perldoc.perl.org/perlsub.html#Temporary-Values-via-local())
Leonardo Herrera
+1  A: 

Either set $/ to undef (see jrockway's answer) or just concatenate all the file's lines:

$content = join('', <$fh>);

It's recommended to use scalars for filehandles on any Perl version that supports it.

kixx
+6  A: 

With File::Slurp

use File::Slurp;
my $text = read_file( 'index.html' ) ;

Yes, even you can use CPAN.

David Dorward
+3  A: 

From perlfaq5: How can I read in an entire file all at once?:


You can use the File::Slurp module to do it in one step.

use File::Slurp;

$all_of_it = read_file($filename); # entire file in scalar
@all_lines = read_file($filename); # one line per element

The customary Perl approach for processing all the lines in a file is to do so one line at a time:

open (INPUT, $file)  || die "can't open $file: $!";
while (<INPUT>) {
 chomp;
 # do something with $_
 }
close(INPUT)      || die "can't close $file: $!";

This is tremendously more efficient than reading the entire file into memory as an array of lines and then processing it one element at a time, which is often--if not almost always--the wrong approach. Whenever you see someone do this:

@lines = <INPUT>;

you should think long and hard about why you need everything loaded at once. It's just not a scalable solution. You might also find it more fun to use the standard Tie::File module, or the DB_File module's $DB_RECNO bindings, which allow you to tie an array to a file so that accessing an element the array actually accesses the corresponding line in the file.

You can read the entire filehandle contents into a scalar.

{
local(*INPUT, $/);
open (INPUT, $file)  || die "can't open $file: $!";
$var = <INPUT>;
}

That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this:

$var = do { local $/; <INPUT> };

For ordinary files you can also use the read function.

read( INPUT, $var, -s INPUT );

The third argument tests the byte size of the data on the INPUT filehandle and reads that many bytes into the buffer $var.

brian d foy
+1  A: 

You're only getting the first line from the diamond operator <FILE> because you're evaluating it in scalar context:

$document = <FILE>; 

In list/array context, the diamond operator will return all the lines of the file.

@lines = <FILE>;
print @lines;
Nathan
Just a note on nomenclature: the spaceship operator is `<=>` and the `<>` is the diamond operator.
toolic
Oh, thanks, I hadn't heard "diamond operator" before and thought they both shared the same name. I will correct it above.
Nathan