views:

721

answers:

4

In perl, I read in files from a directory, and I want to open them all simultaneously (but line by line) so that I can perform a function that uses all of their nth lines together (e.g. concatenation).

my $text = `ls | grep ".txt"`;
my @temps = split(/\n/,$text);
my @files;
for my $i (0..$#temps) {
  my $file;
  open($file,"<",$temps[$i]);
  push(@files,$file);
}
my $concat;
for my $i (0..$#files) {
  my @blah = <$files[$i]>;
  $concat.=$blah;
}
print $concat;

I just a bunch of errors, use of uninitialized value, and GLOB(..) errors. So how can I make this work?

+8  A: 

Here is your problem:

for my $i (0..$#files) {
  my @blah = <$files[$i]>;
  $concat .= $blah;
}

First, <$files[$i]> isn't a valid filehandle read. This is the source of your GLOB(...) errors. See mobrule's answer for why this is the case. So change it to this:

for my $file (@files) {
  my @blah = <$file>;
  $concat .= $blah;
}

Second problem, You're mixing @blah (an array named blah) and $blah (a scalar named blah). This is the source of your "uninitialized value" errors - $blah (the scalar) hasn't been initialized, but you're using it. If you want the $n-th line from @blah, use this:

for my $file (@files) {
  my @blah = <$file>;
  $concat .= $blah[$n];
}

I don't want to keep beating a dead horse, but I do want to address a better way to do something:

my $text = `ls | grep ".txt"`;
my @temps = split(/\n/,$text);

This reads in a list of all files in the current directory that have a ".txt" extension in them. This works, and is effective, but it can be rather slow - we have to call out to the shell, which has to fork off to run ls and grep, and that incurs a bit of overhead. Furthermore, ls and grep are simple and common programs, but not exactly portable. Surely there's a better way to do this:

my @temps;
opendir(DIRHANDLE, ".");
while(my $file = readdir(DIRHANDLE)) {
  push @temps, $file if $file =~ /\.txt/;
}

Simple, short, pure Perl, no forking, no non-portable shells, and we don't have to read in the string and then split it - we can only store the entries we really need. Plus, it becomes trivial to modify the conditions for files that pass the test. Say we end up accidentally reading the file test.txt.gz because our regex matches: we can easily change that line to:

  push @temps, $file if $file =~ /\.txt$/;

We can do that one with grep (I believe), but why settle for grep's limited regular expressions when Perl has one of the most powerful regex libraries anywhere built-in?

Chris Lutz
+1  A: 

Use braces around $files[$i] inside the <> operator

my @blah = <{$files[$i]}>

Otherwise Perl interprets <> as the file glob operator instead of the read-from-filehandle operator.

mobrule
I knew there was a reason `<$files[$i]>` was bad. But that's not the only problem in that loop.
Chris Lutz
+13  A: 

A lot of issues. Starting with call to "ls | grep" :)

Let's start with some code:

First, let's get list of files:

my @files = glob( '*.txt' );

But it would be better to test if the given name relates to file or directory:

my @files = grep { -f } glob( '*.txt' );

Now, let's open these files to read them:

my @fhs = map { open my $fh, '<', $_; $fh } @files;

But, we need a way to handle errors - in my opinion the best way is to add:

use autodie;

At the beginning of script (and installation of autodie, if you don't have it yet). Alternatively you can:

use Fatal qw( open );

Now, that we have it, let's get the first line (as you showed in your example) from all of the inputs, and concatenate it:

my $concatenated = '';

for my $fh ( @fhs ) {
    my $line = <$fh>;
    $concatenated .= $line;
}

Which is perfectly fine, and readable, but still can be shortened, while maintaining (in my opinion) readability, to:

my $concatenated = join '', map { scalar <$_> } @fhs;

Effect is the same - $concatenated contains first lines from all files.

So, whole program would look like this:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;
# use Fatal qw( open ); # uncomment if you don't have autodie

my @files        = grep { -f } glob( '*.txt' );
my @fhs          = map { open my $fh, '<', $_; $fh } @files;
my $concatenated = join '', map { scalar <$_> } @fhs;

Now, it might be that you want to concatenate not just first lines, but all of them. In this situation, instead of $concatenated = ... code, you'd need something like this:

my $concatenated = '';

while (my $fh = shift @fhs) {
    my $line = <$fh>;
    if ( defined $line ) {
        push @fhs, $fh;
        $concatenated .= $line;
    } else {
        close $fh;
    }
}
depesz
+1 your code is better than mine. I would love to maintain this kind of code. Although for completeness it might be noted that `glob()` is considered a somewhat unsafe function, and might not work universally. I can't find a reference for that (you might search StackOverflow and see if you can find anything about it - I remember it from a comment, but don't know where to look at this point).
Chris Lutz
@Chris: Hmm .. never heard about it, but it's possible. In this case - opendir, readdir + grep, closedir should be sufficient.
depesz
I think that the complaints about `glob` refer to older versions of the function. (It used to use the C-shell?) Nevertheless, here's one Perl coder who doesn't like it and why: http://sial.org/blog/2008/01/many_small_errors.html
Telemachus
glob, with a fixed string argument, on Perl >= 5.6, is quite safe. The only convincing argument in that blog post is that spaces may cause trouble, but `"*.txt"` is known not to contain spaces. :)
hobbs
@dep: Have you tried "readdir" instead of glob?
Paul Nathan
@Paul - Have you tried reading other comments? Like - mine comment from 3 hours ago: " ... In this case - opendir, readdir + grep, closedir should be sufficient" :)
depesz
@Paul: A few things worth saying for `glob`: (1) no dotfiles (unless you ask for them), (2) order is guaranteed, (3) no need to prepend the directory on the results in order to do filetests.
Telemachus
+1  A: 

You've got some good answers already. Another way to tackle the problem is to create a list-of-lists containing all of the lines from the files (@content). Then use the each_arrayref function from List::MoreUtils, which will create an iterator that yields line 1 from all files, then line 2, etc.

use strict;
use warnings;
use List::MoreUtils qw(each_arrayref);

my @content =
    map {
        open(my $fh, '<', $_) or die $!;
        [<$fh>]
    }
    grep {-f}
    glob '*.txt'
;
my $iterator = each_arrayref @content;
while (my @nth_lines = $iterator->()){
    # Do stuff with @nth_lines;
}
FM
Wouldn't `map { [<$_>] }` work?
Brad Gilbert
@Brad - It might, but it's a bit cryptic.
Chris Lutz