views:

175

answers:

5

Consider the following perl script (read.pl):

my $line = <STDIN>;
print "Perl read: $line";
print "And here's what cat gets: ", `cat -`;

If this script is executed from the command line, it will get the first line of input, while cat gets everything else until the end of input (^D is pressed).

However, things are different when the input is piped from another process or read from a file:

$ echo "foo\nbar" | ./read.pl
Perl read: foo
And here's what cat gets:

Perl seems to greadily buffer the entire input somewhere, and processes called using backticks or system do no see any of the input.

The problem is that I'd like to unit test a script that mixes <STDIN> and calls to other processes. What would be the best way to do this? Can I turn off input buffering in perl? Or can I spool the data in a way that will "mimic" a terminal?

A: 

Here's a sub-optimal way that I've found:

use IPC::Run;

my $input = "First Line\n";
my $output;
my $process = IPC::Run::start(['./read.pl'], \$input, \$output);
$process->pump() until $output =~ /Perl read:/;
$input .= "Second Line\n";
$process->finish();
print $output;

It's sub-optimal in the sense that one needs to know the "prompt" that the program will emit before waiting for more input.

Another sub-optimal solution is the following:

use IPC::Run;

my $input = "First Line\n";
my $output;
my $process = IPC::Run::start(['./read.pl'], \$input, my $timer = IPC::Run::timer(1));
$process->pump() until $timer->is_expired();
$timer->start(1);
$input .= "Second Line\n";
$process->finish();

It does not require knowledge of any prompt, but is slow because it waits at least two seconds. Also, I don't understand why the second timer is needed (finish won't return otherwise).

Does anybody know better solutions?

Jonas Wagner
+2  A: 

This is not a Perl problem. It is a UNIX/shell problem. When you run a command without pipes you are in line buffering mode, but when you redirect with pipes, you are in block buffering mode. You can see this by saying:

cat /usr/share/dict/words | ./read.pl | head

This C program has the same problem:

#include <stdio.h>

int main(int argc, char** argv) {
    char line[4096];
    FILE* cat;
    fgets(line, 4096, stdin);
    printf("C got: %s\ncat got:\n", line);
    cat = popen("cat", "r");
    while (fgets(line, 4096, cat)) {
        printf("%s", line);
    }
    pclose(cat);
    return 0;
}
Chas. Owens
This is really helpful, thanks.Is there a way that I could tell the shell (or IPC::Run, or popen, or anything else) which buffering mode it should use?
Jonas Wagner
@Jonas Wagner I played around with it for a while. I couldn't find a solution. The short answer is "don't do that". Have `perl` read the contents of `STDIN` and pass it along to the program.
Chas. Owens
Seems like there's Expect.pm for perl, which uses a pseudo-tty to communicate with a process (see my answer below).
Jonas Wagner
+1  A: 

I have good news and bad news.

The good news is a simple modification of read.pl allows you to give it fake input:

#! /usr/bin/perl

use warnings;
use strict;

binmode STDIN, "unix" or die "$0: binmode: $!";

my $line = <STDIN>;
print "Perl read: $line";
print "And here's what cat gets: ", `cat -`;

Sample run:

$ printf "A\nB\nC\nD\n" | ./read.pl 
Perl read: A
And here's what cat gets: B
C
D

The bad news is you get a single switchover: if you try to repeat the read-then-cat, the first cat will starve all subsequent reads. To see this, consider

#! /usr/bin/perl

use warnings;
use strict;

binmode STDIN, "unix" or die "$0: binmode: $!";

my $line = <STDIN>;
print "1: Perl read: $line";
print "1: And here's what cat gets: ", `cat -`;
$line = <STDIN>;
$line = "<undefined>\n" unless defined $line;
print "2: Perl read: $line";
print "2: And here's what cat gets: ", `cat -`;

and then a sample run that produces

$ printf "A\nB\nC\nD\n" | ./read.pl 
1: Perl read: A
1: And here's what cat gets: B
C
D
2: Perl read: <undefined>
2: And here's what cat gets: 
Greg Bacon
Thanks a lot, I didn't know about binmode.
Jonas Wagner
A: 

Finally I ended up with the following solution. Still far from optimal, but it works. Even in situations like the one described by gbacon.

use Carp qw( confess );
use IPC::Run;
use Scalar::Util;
use Time::HiRes;

# Invokes the given program with the given input and argv, and returns stdout/stderr.
#
# The first argument provided is the input for the program. It is an arrayref
# containing one or more of the following:
# 
# * A scalar is simply passed to the program as stdin
#
# * An arrayref in the form [ "prompt", "input" ] causes the function to wait
#   until the program prints "prompt", then spools "input" to its stdin
#
# * An arrayref in the form [ 0.3, "input" ] waits 0.3 seconds, then spools
#   "input" to the program's stdin
sub capture_with_input {
    my ($program, $inputs, @argv) = @_;
    my ($stdout, $stderr);
    my $stdin = '';

    my $process = IPC::Run::start( [$program, @argv], \$stdin, \$stdout, \$stderr );
    foreach my $input (@$inputs) {
        if (ref($input) eq '') {
            $stdin .= $input;
        }
        elsif (ref($input) eq 'ARRAY') {
            (scalar @$input == 2) or
                confess "Input to capture_with_input must be of the form ['prompt', 'input'] or [timeout, 'input']!";

            my ($prompt_or_timeout, $text) = @$input;
            if (Scalar::Util::looks_like_number($prompt_or_timeout)) {
                my $start_time = [ Time::HiRes::gettimeofday ];
                $process->pump_nb() while (Time::HiRes::tv_interval($start_time) < $prompt_or_timeout);
            }
            else {
                $prompt_or_timeout = quotemeta $prompt_or_timeout;
                $process->pump until $stdout =~ m/$prompt_or_timeout/gc;
            }

            $stdin .= $text;
        }
        else {
            confess "Unknown input type passed to capture_with_input!";
        }
    }
    $process->finish();

    return ($stdout, $stderr);
}

my $input = [
    "First Line\n",
    ["Perl read:", "Second Line\n"],
    [0.5, "Third Line\n"],
];
print "Executing process...\n";
my ($stdout, $stderr) = capture_with_input('./read.pl', $input);
print "done.\n";
print "STDOUT:\n", $stdout;
print "STDERR:\n", $stderr;

Usage example (with a slightly modified read.pl to test gbacon's case):

$ time ./spool_read4.pl
Executing process...
done.
STDOUT:
Perl read: First Line
And here's what head -n1 gets: Second Line
Perl read again: Third Line

STDERR:
./spool_read4.pl  0.54s user 0.02s system 102% cpu 0.547 total

Still, I'm open to better solutions...

Jonas Wagner
A: 

Today I think I've found what I needed: Perl has a module called Expect which is perfect for such situations:

#!/usr/bin/perl

use strict;
use warnings;

use Expect;

my $exp = Expect->spawn('./read.pl');
$exp->send("First Line\n");
$exp->send("Second Line\n");
$exp->send("Third Line\n");
$exp->soft_close();

Works like a charm ;)

Jonas Wagner