tags:

views:

150

answers:

6

Hey guys,

The other answered questions were a bit complicated for me, as I'm extremely new to using Perl.

I'm curious how Perl reads in the files, how to tell it to advance to the next line in the text file, and how to make it read all lines in the .txt file until, for example, it reaches item "banana".

Any and all help would be appreciated, thanks!

+9  A: 

Perl has an excellent set of documentation. You can find information about Perl basics, including reading from files ("Files and I/O"), in the perlintro manual page.

Greg Hewgill
Thanks man! This'll definitely be a great and understandable resource! Thanks!
Befall
+5  A: 

Basically, there are two ways of reading files:

  1. Slurping a file means reading the file all at once. This uses a lot of memory and takes a while, but afterwards the whole file contents is in memory and you can do what you want with it.
  2. Reading a file line-per-line (in a while loop) is better if you don't want to read the entire file (for example, stop when you reach "banana")

For both ways you need to create a FILEHANDLE using the "open" command, like so:

open(my $yourhandle, '<', 'path/to/file.txt') #upi sjpi;d always use a variable here containing filename
    or die "Unable to open file, $!";

Then you can either slurp the file by putting it into an array:

my @entire_file=<$yourhandle>; #Slurp!

or read the file one by one using a while loop

while(<$yourhandle>) { #Read the file line per line (or otherwise, it's configurable)
   print "The line is now in the $_ variable";
   last if $_ eq 'banana'; #Leave the while-loop.
}

Afterwards, don't forget to close the file.

close($yourhandle)
    or warn "Unable to close the file handle: $!";

That's just the basics.. there's a lot to do with files, especially in Exception handling (what to do when the file does not exist, is not readable, is being written to), so you'll have to read up or ask away :)

Konerak
Awesome! Just what I was looking for, thanks!
Befall
Don't use this for production code though - this is just to start learning. Be sure to check for exceptions everywhere!
Konerak
Konerak, could you start to use and teach lexical file handles?
daxim
This is a mostly good writeup but it has several serious problems. Please do not encourage the use of package global filehandles--use lexical handles is an important practice that reduces errors. Your other error is that `open` and `close`, do not generate exceptions when they fail (unless you are using `autodie`), they just return false. Muddy use of important technical vocabulary is very damaging to newbies.
daotoad
On the plus side, you give a particularly clear explanation of 'slurp' and clearly describe the issues with the practice.
daotoad
Using lexical file handles now needs to be standard practice, especially when talking about the system to newbies. I've edited your question to reflect modern practices since it was accepted as the answer. If you feel really strongly about it go ahead and revert it, but please consider keeping it. Teaching people the 10+ year old way of doing things when there's no good reason is really a disservice to the Perl community in general.
Robert P
+1  A: 

First, you have to open the file:

open SOME_FILEHANDLE, "filename.txt";

You might want to check if the opening of the file was successful:

open SOME_FILEHANDLE, "filename.txt" or die "could not open filename";

After opening the file, you can read line per line from SOME_FILEHANDLE. You get the next line with the <SOME_FILEHANDLE> construct:

my $next_line = <SOME_FILEHANDLE>;

$next_line is undefined after reading the last line. So, you can put the entire thing into a while loop:

while (my $next_line = <SOME_FILEHANDLE>) {
  do_something($next_line);
}

This works because an undefined value evaluates to false in the while condition.

If you want to exit the loop when "banana" is encountered, you will probably use a regular expression to check for banana:

while (my $next_line = <SOME_FILEHANDLE>) {
  last if $next_line =~ /banana/;
  do_something($next_line);
}

The last operator exits the while loop and it is "triggered" when $next_line matches banana.

René Nyffenegger
More useful info, awesome, thank you!
Befall
René, could you start to use and teach lexical file handles?
daxim
I probably could teach lexical file handles, but I am not sure if I could start using them myself.
René Nyffenegger
Then take this downvote as discouragement of outdated practices.
daxim
Why can't you use lexical handles? Do you have a strong reason?
daotoad
Second that down vote. Teaching lexical file handles is just not an option any more; the glob filehandle syntax is more than 10 years out of date, fraught with peril, and needs to be squashed. I'll be happy to remove the downvote if it's changed.
Robert P
I DO agree that teaching lexical file handles should be encouraged, and I am guilty as charged as I haven't done so in my answer. BUT as to *starting using them myself* I cannot agree as I have a lot of Perl script that work as I need them to work since long times and they all use the *old* non-lexical file handles and I don't see why I should change MY scripts, especially as nobody else sees them and as they work fine.
René Nyffenegger
There's no reason to change an existing, working script or program. It *may* make sense to modify older scripts as they require other maintenance. I have a hard time seeing why you wouldn't want to use better, safer methods in new scripts. Your write-up is otherwise quite good, so +1 -1 = 0, so no down-vote from me, change it and I'll up-vote it happily.
daotoad
+1  A: 

Perl documentation is your best friend to learn this "Files-and-I/O". Also this will also give you some details about "File handing in perl".

Space
+1  A: 

I usually tell people to start with Impatient Perl http://www.perl.org/books/impatient-perl/

+5  A: 

René and Konerak wrote a couple of pretty good responses that show how to open and read a file. Unfortunately they have some issues in terms of promoting best practices. So, I'll come late to the party and try to add clear explanation of best practices approach and why it is better to use the best practice approach.

What is a file handle?

A file handle is a name we use that represents the file itself. When you want to operate on a file (read it, write to it, move around, etc) use the file handle to indicate which file to operate on. A file handle is distinct from the file's name or path.

Variable scope and file handles

A variable's scope determines in what parts of a program the variable can be seen. In general, it is a good idea to keep the scope on every variable as small possible so that different parts of a complex program don't break each other.

The easiest way to strictly control a variable's scope in Perl is to make it a lexical variable. Lexical variables are only visible inside the block in which they are declared. Use my to declare a lexical variable: my $foo;

# Can't see $foo here

{   my $foo = 7;
    print $foo;
}

# Can't see $foo here

Perl file handles can be global or lexical. When you use open with a bare word (a literal string without quotes or a sigil), you create a global handle. When you open on an undefined lexical scalar, you create a lexical handle.

open FOO, $file;      # Global file handle
open my $foo, $file;  # Lexical file handle

# Another way to get a lexical handle:
my $foo;
open $foo, $file;

The big problem with global file handles is that they are visible anywhere in the program. So if I create a file handle named FOO in subroutine, I have to very careful to ensure that I don't use the same name in another routine, or if I use the same name, I must be absolutely certain that under no circumstances can they conflict with each other. The simple alternative is to use a lexical handle that cannot have the same kind of name conflicts.

Another benefit of lexical handles is that it is easy to pass them around as subroutine arguments.

The open function

The open function has all kinds of features. It can run subprocesses, read files, and even provide a handle for the contents of a scalar. You can feed it many different types of argument lists. It is very powerful and flexible, but these features come with some gotchas (executing subprocesses is not something you want to do by accident).

For the simple case of opening a file, it is best to always use the 3-argument form because it prevents unintended activation of all those special features:

open FILEHANDLE, MODE, FILEPATH

FILEHANDLE is the file handle to open.

MODE is how to open the file, > for overwrite, '>>for write in append mode,+>for read and write, and<` for read.

FILEPATH is the path to the file to open.

On success, open returns a true value. On failure, $! is set to indicate the error, and a false value is returned.

So, to make a lexical file handle with a 3-argument open that we can use to read a file:

open my $fh, '<', $file_path;

The logical return values make it easy to check for errors:

open my $fh, '<', $file_path
    or die "Error opening $file_path - $!\n";

I like to bring the error handling down to a new line and indent it, but that's personal style.

Closing handles

When you use global handles it is critical to carefully, explicitly close each and every handle when you are done with it. Failure to do so can lead to odd bugs and maintainability problems.

close FOO;

Lexical handles automatically close when the variable is destroyed (when the reference count drops to 0, usually when the variable goes out of scope).

When using lexical handles it is common to rely on the implicit closure of handles rather than explicitly closing them.

Diamonds are a Perl's best friend.

The diamond operator, <>, allows us to iterate over a file handle. Like open it has super powers. We'll ignore most of them for now. (Search for info on the input record separator, output record separtor and the NULL file handle to learn about them, though.)

The important thing is that in scalar context (e.g. assigning to a scalar) it acts like a readline function. In list context (e.g. assigning to an array) it acts like a read_all_lines function.

Imagine you want to read a data file with 3 header lines (date, time and location) and a bunch of data lines:

open my $fh, '<', $file_path 
    or die "Ugh - $!\n";

my $date = <$fh>;
my $time = <$fh>;
my $loc  = <$fh>;

my @data = <$fh>;

It's common in to hear people talk about slurping a file. This means to read the whole file into a variable at once.

 # Slurp into array
 my @slurp = <$fh>;

 # Slurp into a scalar - uses tricks outside the scope of this answer
 my $slurp; 
 { local $/ = undef; $slurp = <$fh>; }

Putting it all together

open my $fh, '<', 'my_file'
    or die "Error opening file - $!\n";

my @before_banana;

while( my $line = <$fh> ) {
    last if $line =~ /^banana$/;

    push @before_banana, $line;
}

Putting it all together - special extra credit edition

my $fh = get_handle( 'my_file' );

my @banana = read_until( $fh, qr/^banana$/ );  # Get the lines before banana

read_until( $fh, qr/^no banana$/ );            # Skip some lines

my @potato = read_until( $fh, qr/^potato$/ );  # Get the lines before potato

sub get_handle {
    my $file_path = shift;

    open my $fh, '<', $file_path 
        or die "Can't open '$file_path' for reading - $!\n";

    return $fh;
}

sub read_until {
    my $fh    = shift;
    my $match = shift;

    my @lines;

    while( my $line = <$fh> ) {
        last if $line =~ /$match/;
        push @line, $line;
    }

    return @lines;
}

Why so many different ways? Why so many gotchas?

Perl is an old language, it has baggage dating all the way back to 1987. Over the years various design issues were found and fixes were made--but only rarely were fixes allowed to harm backwards compatibility.

Further, Perl is designed to give you the flexibility to do what you want to, when you want to. It is very permissive. The good thing about this is that you can reach down into the murky depths and do really cool magical stuff. The bad thing is that it is easy to shoot yourself in the foot if you forget to temper your exuberance and fail to focus on producing readable code.

Just because you've got more than enough rope, doesn't mean that you have to hang yourself.

daotoad
WOW, this is awesome. Thanks for such an in-depth response, I really appreciate it. This helps me understand the process a ton.
Befall