views:

1178

answers:

8

I need to serve a large file (500+ MB) for download from a location that is not accessible to the web server. I found the question Serving large files with PHP, which is identical to my situation, but I'm using Perl instead of PHP.

I tried simply printing the file line by line, but this does not cause the browser to prompt for download before grabbing the entire file:

use Tie::File;

open my $fh, '<', '/path/to/file.txt';
tie my @file, 'Tie::File', $fh
    or die 'Could not open file: $!';
my $size_in_bytes = -s $fh;
print "Content-type: text/plain\n";
print "Content-Length: $size_in_bytes\n";
print "Content-Disposition: attachment; filename=file.txt\n\n";
for my $line (@file) {
    print $line;
}
untie @file;
close $fh;
exit;

Does Perl have an equivalent to PHP's readfile() function (as suggested with PHP) or is there a way to accomplish what I'm trying to do here?

+2  A: 

The readline function is called readline (and can also be written as <>).

I'm not sure what problem you're having. Perhaps that for loops aren't lazily evaluated (which they're not). Or, perhaps Tie::File is screwing something up? Anyway, the idiomatic Perl for reading a file a line at a time is:

open my $fh, '<', $filename or die ...;
while(my $line = <$fh>){
   # process $line
}

No need to use Tie::File.

Finally, you should not be handling this sort of thing yourself. This is a job for a web framework. If you were using Catalyst (or HTTP::Engine), you would just say:

open my $fh, '<', $filename ...
$c->res->body( $fh );

and the framework would automatically serve the data in the file efficiently. (Using stdio via readline is not a good idea here, it's better to read the file in blocks from the disk. But who cares, it's abstracted!)

jrockway
Sorry, I meant to say readfile(), as was suggested in the question I linked. Question updated.
cowgod
I tried using a while loop without Tie::Cache first, which behaves the same way. I still don't get a download prompt for the file until the whole thing is downloaded.
cowgod
Best to let your web framework deal with it, then. Try HTTP::Engine, it works with CGI.
jrockway
+1  A: 

Answering the (original) question ("Does Perl have an equivalent to PHP's readline() function ... ?"), the answer is "the angle bracket syntax":

open my $fh, '<', '/path/to/file.txt';
while (my $line = <file>) {
    print $line;
}

Getting the content-length with this method isn't necessarily easy, though, so I'd recommend staying with Tie::File.

NOTE Using:

    for my $line (<$filehandle>)

(as I originally wrote) copies the contents of the file to a list and iterates over that. Using

    while (my $line = <$filehandle>)

does not. When dealing with small files the difference isn't significant, but when dealing with large files it definitely can be.

Answering the (updated) question ("Does Perl have an equivalent to PHP's readfile() function ... ?"), the answer is slurping. There are a couple of syntaxes, but Perl6::Slurp seems to be the current module of choice.

The implied question ("why doesn't the browser prompt for download before grabbing the entire file?") has absolutely nothing to do with how you're reading in the file, and everything to do with what the browser thinks is good form. I would guess that the browser sees the mime-type and decides it knows how to display plain text.


Looking more closely at the Content-Disposition problem, I remember having similar trouble with IE ignoring Content-Disposition. Unfortunately I can't remember the workaround. IE has a long history of problems here (old page, refers to IE 5.0, 5.5 and 6.0). For clarification, however, I would like to know:

  1. What kind of link are you using to point to this big file (i.e., are you using a normal a href="perl_script.cgi?filename.txt link or are you using Javascript of some kind)?

  2. What system are you using to actually serve the file? For instance, does the webserver make its own connection to the other computer without a webserver, and then copy the file to the webserver and then send the file to the end user, or does the user make the connection directly to the computer without a webserver?

  3. In the original question you wrote "this does not cause the browser to prompt for download before grabbing the entire file" and in a comment you wrote "I still don't get a download prompt for the file until the whole thing is downloaded." Does this mean that the file gets displayed in the browser (since it's just text), that after the browser has downloaded the entire file you get a "where do you want to save this file" prompt, or something else?

I have a feeling that there is a chance the HTTP headers are getting stripped out at some point or that a Cache-control header is getting added (which apparently can cause trouble).

Max Lybbert
+1 for answering the implied question correctly (at least the way I was going to)
David Zaslavsky
Sorry, I meant to say readfile(), as was suggested in the question I linked. Question updated.
cowgod
Compare `for (<>)` <- list context and `while (<>)` <- scalar context.
J.F. Sebastian
+6  A: 

If you just want to slurp input to output, this should do the trick.

use Carp ();

{ #Lexical For FileHandle and $/ 
  open my $fh, '<' , '/path/to/file.txt' or Carp::croak("File Open Failed");
  local $/ = undef; 
  print scalar <$fh>; 
  close $fh or Carp::carp("File Close Failed");
}

I guess in response to the "Does Perl have a PHP ReadFile Equivelant" , and I guess my answer would be "But it doesn't really need one".

I've used PHP's manual File IO controls and they're a pain, Perls are just so easy to use by comparison that shelling out for a one-size-fits-all function seems over-kill.

Also, you might want to look at X-SendFile support, and basically send a header to your webserver to tell it what file to send: http://john.guen.in/past/2007/4/17/send_files_faster_with_xsendfile/ ( assuming of course it has permissions enough to access the file, but the file is just NOT normally accessible via a standard URI )

Edit Noted, it is better to do it in a loop, I tested the above code with a hard-drive and it does implicitly try store the whole thing in an invisible temporary variable and eat all your ram.

Alternative using blocks

The following improved code reads the given file in blocks of 8192 chars, which is much more memory efficient, and gets a throughput respectably comparable with my disk raw read rate. ( I also pointed it at /dev/full for fits and giggles and got a healthy 500mb/s throughput, and it didn't eat all my rams, so that must be good )

{ 
    open my $fh , '<', '/dev/sda' ; 
    local $/ = \8192; # this tells IO to use 8192 char chunks. 
    print $_ while defined ( $_ = scalar <$fh> ); 
    close $fh; 
}

Applying jrockways suggestions

{ 
    open my $fh , '<', '/dev/sda5' ; 
    print $_ while ( sysread $fh, $_ , 8192 ); 
    close $fh; 
}

This literally doubles performance, ... and in some cases, gets me better throughput than DD does O_o.

Kent Fredric
FWIW, you should use "sysread" instead of changing $INPUT_RECORD_SEPARATOR.
jrockway
Out of curiosity, where did 8192 come from?
cowgod
no, 8192 *bytes*. Its just a handy unit. 4096 is usally the filesystem block size, and 8192 fits conveniently with most drives disk-cache these days, so its a convenient transfer unit. You can use any number, but powers of 2 are best.
Kent Fredric
+2  A: 

You could use my Sys::Sendfile module. It's should be highly efficient (as it uses sendfile underneath the hood), but not entirely portable (only Linux, FreeBSD and Solaris are currently supported).

Leon Timmermans
+1  A: 

When you say "this does not cause the browser to prompt for download" -- what's "the browser"?

Different browsers behave differently, and IE is particularly wilful, it will ignore headers and decide for itself what to do based on reading the first few kb of the file.

In other words, I think your problem may be at the client end, not the server end.

Try lying to "the browser" and telling it the file is of type application/octet-stream. Or why not just zip the file, especially as it's so huge.

AmbroseChapel
I tried zipping the file and I get the same behavior. The same thing happens with IE, Firefox, and Chrome.
cowgod
+1  A: 

Don't use for/foreach (<$input>) because it reads the whole file at once and then iterates over it. Use while (<$input>) instead. The sysread solution is good, but the sendfile is the best performance-wise.

Benoît
A: 

I've successfully done it by telling the browser it was of type application/octet-stream instead of type text/plain. Apparently most browsers prefer to display text/plain inline instead of giving the user a download dialog option.

It's technically lying to the browser, but it does the job.

Kurt W. Leucht
You're supposed to use the Content-Disposition header for this.
Christoffer Hammarström
A: 

The most efficient way to serve a large file for download depends on a web-server you use.

In addition to @Kent Fredric X-Sendfile suggestion:

File Downloads Done Right have some links that describe how to do it for Apache, lighttpd (mod_secdownload: security via url generation), nginx. There are examples in PHP, Ruby (Rails), Python which can be adopted for Perl.

Basically it boils down to:

  1. Configure paths, and permissions for your web-server.
  2. Generate valid headers for the redirect in your Perl app (Content-Type, Content-Disposition, Content-length?, X-Sendfile or X-Accel-Redirect, etc).

There are probably CPAN modules, web-frameworks plugins that do exactly that e.g., @Leon Timmermans mentioned Sys::Sendfile in his answer.

J.F. Sebastian