ansaurus

Question

In Perl, can I limit the length of a line as I read it in from a file (like fgets)?

Answer 1

+1 A:

Use the read function (perlfunc read)

Konerak 2010-05-28 15:54:19

The beauty of fgets is that it either reads N pieces of data or stops at a newline. I don't think read stops at a newline.

SB 2010-05-28 15:57:04

Answer 2

+4 A:

sub heres_what_id_do($$) {
    my ($fh, $len) = @_;
    my $buf = '';

    for (my $i = 0; $i < $len; ++$i) {
        my $ch = getc $fh;
        last if !defined $ch || $ch eq "\n";
        $buf .= $ch;
    }

    return $buf;
}

Not very "Perlish" but who cares? :) The OS (and possibly Perl itself) will do all the necessary buffering underneath.

j_random_hacker 2010-05-28 16:13:11

`== '\n'` should be `eq "\n"`. `getc` makes this much simpler than using `read` to get a single character. Benchmarking shows its slower than mine by about 15%. Interestingly, the 3 arg for is significantly faster than `for my $i (0..$len-1)` but not than `my $i; my $end = $len-1; for $i (0..$len)` (it brings it up to parity with mine) indicating that Perl's `for(0..$foo)` iterator optimization is easily defeated.

Schwern 2010-05-28 17:54:22

Thanks for the edit Schwern. It's embarrassing but I didn't know Perl actually has `getc()`! Will edit to use that.

j_random_hacker 2010-05-30 06:59:54

Answer 3

A:

You can implement fgets() yourself trivially. Here's one that works like C:

sub fgets{my($n,$c)=($_[1],''); ($_[0])=('');
  for(;defined($c)&&$c ne "\n"&&$n>0;$n--){$_[0].=($c=getc($_[2]));}
  defined($c)&&$_[0]; }

Here's one with PHP's semantics:

sub fgets{my($n,$c,$x)=($_[1],'','');
  for(;defined($c)&&$c ne "\n"&&$n>0;$n--){$x.=($c=getc($_[0]));}
  ($x ne '')&&$x; }

If you're trying to implement resource limits (i.e. trying to prevent an untrusted client from eating up all your memory) you really should not be doing it this way. Use ulimit to set up those resource limits before calling your script. A good sysadmin will set up resource limits anyway, but they like it when programmers make startup scripts that set reasonable limits.

If you're trying to limit input before you proxy this data to another site (say, limiting SMTP input lines because you know remote sites might not support more than 511 characters), then just check the length of the line after <INPUT> with length().

geocar 2010-05-28 16:15:22

Can't...understand...code! It throws a warning at eof because it concatenates before checking if $c is defined. While it mirrors C's fgets very admirably, its not very Perlish. For all its inscrutability its no faster than mine or j_random's.

Schwern 2010-05-28 18:04:05

@Schwem: Then `no strict` if you are bothered by it.

geocar 2010-05-28 20:58:24

Answer 4

+4 A:

Perl has no built-in fgets, but File::GetLineMaxLength implements it.

If you want to do it yourself, its pretty straightforward with getc.

sub fgets {
    my($fh, $limit) = @_;

    my($char, $str);
    for(1..$limit) {
        my $char = getc $fh;
        last unless defined $char;
        $str .= $char;
        last if $char eq "\n";
    }

    return $str;
}

Concatenating each character to $str is efficient as Perl will realloc opportunistically. If a Perl string has 16 bytes and you concatenate another character, Perl will reallocate it to 32 bytes (32 goes to 64, 64 to 128...) and remember the length. The next 15 concatenations require no memory reallocations or calls to strlen.

Schwern 2010-05-28 17:42:20

I think this is clean, and I saw another one of your answers that discussed preallocating a string in Perl. Combining the two gets rid of the inefficiencies (if any) of constant reallocation since I only need to allocate the max length one time.

SB 2010-05-28 18:36:34

Thanks. I don't think preallocation is going to buy you much. In fact, it'll probably be slower since its likely slower to preallocate a string in Perl than let perl do it. You'll also waste a lot of memory since every string will be using the maximum memory. Benchmarking bears this out. If you really want this to be as fast as possible, write an XS wrapper around fgets(). Its fairly trivial (by XS standards).

Schwern 2010-05-28 19:53:14

What I meant was preallocate the string outside of the calls to fgets and pass by reference to your fgets to append to. Though not sure what happens when I assign the string to another. I might as well just let it allocate itself

SB 2010-05-28 22:24:41

@SB I tried that, its about 5% slower. My guess is the dereferencing inside the loop slows things down more than you save in preallocating. Using an alias to $_[2] like geocar's doesn't help either (doesn't hurt). Rule of thumb for Perl optimization is you can't beat perl with Perl. You can see the benchmark program here: http://gist.github.com/417919 I don't think you're going to make this much faster by micro-optimizing, there's just a certain amount of overhead of looping over each character in a file in Perl.

Schwern 2010-05-29 00:38:01

+1, but I hate to see people worried about a 5% change in speed when writing in an interpreted language.

j_random_hacker 2010-05-30 13:36:40

@j_random_hacker Well, its not the 5% really but that the one with the worse interface isn't faster.

Schwern 2010-05-30 21:06:55

Answer 5

+2 A:

As an exercise, I've implemented a wrapper around C's fgets() function. It falls back to a Perl implementation for complicated filehandles defined as "anything without a fileno" to cover tied handles and whatnot. File::fgets is on its way to CPAN now, you can pull a copy from the repository.

Some basic benchmarking shows its over 10x faster than any of the implementations here. However, I cannot say its bug free or doesn't leak memory, my XS skills are not that great, but its better tested than anything here.

Schwern 2010-05-28 21:44:56

ansaurus

tags:

views:

answers:

In Perl, can I limit the length of a line as I read it in from a file (like fgets)?

related questions