views:

94

answers:

3

I've got a question in my test:

What is wrong with program that counts number of lines and words in file?

open F, $ARGV[0] || die $!;
my @lines = <F>;
my @words = map {split /\s/} @lines;
printf "%8d %8d\n", scalar(@lines), scalar(@words);
close(F); 

My conjectures are:

  1. If file does not exist, program won't tell us about that.
  2. If there are punctuation signs in file, program will count them, for example, in

    abc cba
    , , ,dce
    

    will be five word, but on the other hand wc outputs the same result, so it might be considered as correct behavior.

  3. If F is a large file, it might be better to iterate over lines and not to dump it into lines array.

Do you have any less trivial ideas?

+1  A: 

When you open F, $ARGV[0] || die $! that will effectively exit if the file doesn't exist.

There are some improvements to be made here:

{local $/; $lines = <F>;} # read all lines at once

my @words = split /\W+/, $lines;
amphetamachine
+3  A: 
  • Your conjecture #1 is incorrect: your program will die if the open fails. (see cjm's answer re order of operations.)
  • you're using a global filehandle, rather than a lexical variable.
  • you're not using the three-argument form of open.
  • you could just read from stdin, which gives more flexibility as to input - the user can provide a file, or pipe the input into stdin.
  • lastly, I wouldn't write my own code to parse words; I'd reach for CPAN, say something like Lingua::EN::Splitter.
use strict; use warnings;
use Lingua::EN::Splitter qw(words);
my ($wordcount, $lines);
while (<>)
{
    my $line = $_;
    $lines++;
    $wordcount += scalar(words $line);
}

printf "%8d %8d\n", $lines, $wordcount;
Ether
No, it will die if the filename is false. See my answer.
cjm
+5  A: 

On the first line, you have a precedence problem:

open F, $ARGV[0] || die $!;

is the same as

open F, ($ARGV[0] || die $!);

which means the die is executed if the filename is false, not if the open fails. You wanted to say

open(F, $ARGV[0]) || die $!;

or

open F, $ARGV[0] or die $!;

Also, you should be using the 3 argument form of open, in case $ARGV[0] contains characters that mean something to open.

open F, '<', $ARGV[0] or die $!;

On a different note, splitting on /\s/ means that you get a "word" between consecutive whitespace characters. You probably meant /\s+/, or as amphetamachine suggested, /\W+/, depending on how you want to define a "word".

That still leaves the problem of the empty "word" you get if the line begins with whitespace. You could split on ' ' to suppress that (it's a special case), or you could trim leading whitespace first, or insert a grep { length $_ } to weed out empty "words", or abandon split and use a different method for counting words.

Processing line by line instead of reading the whole file at once would also be a good improvement, but it's not as important as those first two items.

cjm
Thank you, I think, precedence problem is the answer for that question.
alexanderkuk
\s is also an error actually.
alexanderkuk