views:

281

answers:

3

I need to create Perl code which allows counting paragraphs in text files. I tried this and doesn't work:

open(READFILE, "<$filename")
or die "could not open file \"$filename\":$!";

$paragraphs = 0;

my($c);

while($c = getc(READFILE))
{
if($C ne"\n")
{
$paragraphs++;
}
}

close(READFILE);

print("Paragraphs: $paragraphs\n");
+1  A: 

Have a look at the Beginning Perl book at http://www.perl.org/books/beginning-perl/. In particular, the following chapter will help you: http://docs.google.com/viewer?url=http%3A%2F%2Fblob.perl.org%2Fbooks%2Fbeginning-perl%2F3145_Chap06.pdf

Dominik
+6  A: 

See perlfaq5: How can I read in a file by paragraphs?

local $/ = '';  # enable paragraph mode
open my $fh, '<', $file or die "can't open $file: $!";
1 while <$fh>;
my $count = $.;
eugene y
local $/ = '';(won't make a difference in a small script but in a bigger script you might upset another modules use of $/)
justintime
@justintime: yes, this is a good idea
eugene y
A: 

If you're determining paragraphs by a double-newline ("\n\n") then this will do it:

open READFILE, "<$filename"
    or die "cannot open file `$filename' for reading: $!";
my @paragraphs;
{local $/; @paragraphs = split "\n\n", <READFILE>} # slurp-split
my $num_paragraphs = scalar @paragraphs;
__END__

Otherwise, just change the "\n\n" in the code to use your own paragraph separator. It may even be a good idea to use the pattern \n{2,}, just in case someone went crazy on the enter key.

If you are worried about memory consumption, then you may want to do something like this (sorry for the hard-to-read code):

my $num_paragraphs;
{local $/; $num_paragraphs = @{[ <READFILE> =~ /\n\n/g ]} + 1}

Although, if you want to keep using your own code, you can change if($C ne"\n") to if($c eq "\n").

amphetamachine
eugene y's answer is better for long texts - this one will chew memory
singingfish
I feel like I shouldn't be the one to point this out, but I doubt that memory consumption is very high on the list of priorities of the average Perl programmer. ;-)
amphetamachine