ansaurus

Question

How can I read all of the lines between two lines in a file, using Perl?

Answer 1

+2 A:

You could use this in 'sed':

sed -n /nch/,/kary/p $file

You could use 's2p' to convert this to Perl.

You could also write pure Perl:

while (<>)
{
    next unless /nch/;
    print;
    while (<>)
    {
        print;
        last if /kary/;
    }
}

Strictly, both these solutions will print each set of lines from 'nch' to 'kary'; if 'nch' appears more than once, it will print more than one chunk of code. It is easy to fix that, especially in the pure Perl ('sed' solution left as an exercise for the reader).

OUTER:
while (<>)
{
    next unless /nch/;
    print;
    while (<>)
    {
        print;
        last OUTER if /kary/;
    }
}

Also, the solutions look for 'nch' and 'kary' as part of the line - not for the whole line. If you need them to match the whole line, use '/^nch$/' etc as the regex.

Jonathan Leffler 2009-11-21 05:22:26

using 2 while loops is not necessary, is it?

ghostdog74 2009-11-21 06:28:13

It depends on whether you want to use variables other than '$_'. If you use a variable (as in Hairy Jock's solution), then no, the double loop is not necessary. If you do it with no extra variables, the second loop takes the place of the state variable.

Jonathan Leffler 2009-11-21 06:32:48

There's a direct oneliner perl equivalent to your awk example: `perl -ne 'print if /foo/../bar/'`

hobbs 2009-11-21 08:37:52

IMO, using 2 while loops like that is not necessary and bad design. what if i have many more variables to set? Explicitly declaring variables is more understandable and appropriate.

ghostdog74 2009-11-21 09:25:47

@hobbs: yes, there is a one-liner equivalent to the version without the label. @ghostdog74: if I have more complex stuff to deal with, I use variables - this isn't complex, so I wrote a solution that didn't use them. I didn't agonize over how to do it - it is an idiom I've used before, and will use again, when appropriate. And it is only appropriate in simple cases.

Jonathan Leffler 2009-11-21 14:44:51

Answer 2

+1 A:

Something like:

$filter = 0;
while (<>) {
  chomp;
  $filter = 1 if (! $filter && /^nch$/);
  $filter = 0 if ($filter && /^ban$/);
  print($_, "\n") if ($filter);
}

should work.

Hairy Jock 2009-11-21 05:27:12

The chomp is not strictly necessary, and not chomping simplifies the printing. Also, it not clear why you chose 'ban' when the question is asking about 'kary'; that could be regarded as making unwarranted assumptions about the data layout.

Jonathan Leffler 2009-11-21 05:32:47

The "chomp" allows matching the entire line - /nch/ or /ban/ are simply too permissive when grinches or banshees arrive.

Hairy Jock 2009-11-21 06:08:19

The 'chomp' is not needed to make the '/^nch$/' matches work. And you do what the question asks with: `my $filter = 0;while (<>){ $filter = 1 if (! $filter print if ($filter); $filter = 0 if ($filter }`. Note the repositioning of the print and the test for 'kary'.

Jonathan Leffler 2009-11-21 06:29:35

"There's usually more than one way to skin a cat", to paraphrase Larry Wall.

Hairy Jock 2009-11-21 06:43:36

True, but some ways are better than others.

Jonathan Leffler 2009-11-21 07:00:49

Propriety please!

Kinopiko 2009-11-21 07:53:03

My plea too. I wouldn't have minded if the guy's code had either worked, or was well written. Apologies for offence you took hereby.

Hairy Jock 2009-11-21 09:01:11

@Hairy Jock: are you telling us that you must use chomp on some platform (I assume Windows since I work mainly on Unix and know that it is not necessary there) for a 'whole line' pattern match like '`/^nch$/`' to work? (I've checked on Win XP with ActivePerl 5.10.0 under Cygwin; the anchored matches work without chomp but you may have another environment in mind.) Which code you are asserting does not work? The code copied from my comment with code in it works correctly in '`perl -e '...pasted...'`' on MacOS X 10.5.8.

Jonathan Leffler 2009-11-21 15:24:00

You've got two loops in your code (reading the same file handle - why?) and your first attempt had /pch/ before you spotted and edited the bug, and you're pious enough to think you're some kinda better Monk?I normally split lines with /\r?\n\r?/ in Linux CGI processes, for info, and the \r causes me problems in the real, paying, world.

Hairy Jock 2009-11-21 16:34:08

Answer 3

A:

if you only want to read one block, in gawk

gawk '/kary/&&f{print;exit}/nch/{f=1}f' file

in Perl

perl -lne '$f && /kary/ && print && exit;$f=1 if/nch/; $f && print' file

or

while (<>) {
    chomp;
    if ($f && /kary/) {
        print $_."\n";
        last;
    }
    if (/nch/) { $f = 1; }
    print $_ ."\n" if $f;
}

ghostdog74 2009-11-21 05:35:10

Both the Perl and the Awk scripts print kary and terminate if it appears before the first nch - which is not part of the spec.

Jonathan Leffler 2009-11-21 06:36:15

fixed. thks for pointing out

ghostdog74 2009-11-21 09:41:17

Answer 4

+7 A:

If I understand your question correctly, this is pretty simple.

    #!perl -w
    use strict;
    use autodie;

    open my $in,'<',"File1.txt";
    open my $out,'>',"File2.txt";

    while(<$in>){
    print $out $_ if /^nch/ .. /^kary/;
    }

Mike 2009-11-21 06:43:56

also : $ perl -lne 'print if /nch/../kary/' file >output

ghostdog74 2009-11-21 06:45:46

The '-l' option is not necessary, but the one-liner works nicely as a simplification of Mike's script. A program that builds file names in like that is not a general program, Mike.

Jonathan Leffler 2009-11-21 07:07:49

Still, +1 for the use of range notation.

Jonathan Leffler 2009-11-21 07:08:20

@Jonathan, I assume I'll have to stick to this use of file open format. I know the (<>) thing processes the command line parameters, but it never works for me, perhaps because of the way I run the Perl scripts on Windows. But thanks for the comment and the upvote:)

Mike 2009-11-21 07:50:22

@Mike +1 But the answer would be much better if it read from standard input and wrote to standard output. Is there a good reason why you're running your scripts in some non-standard way that doesn't allow you to use `<>`? I've heard you mention this problem before and was puzzled. Anyway, keep up the good work: you're definitely improving your Perl chops.

FM 2009-11-21 16:23:21

@FM, Thanks for the encouragement :) When I posted here my first question "How can I search multiple files for a string in Perl" about one month ago, @Jonathan provided a solution which involves the use of <>. But I couldn't get it to work. And @Jonathan kindly explained that I could list the files to be processed by typing the script name and the file names in command line and that worked. But actually I didn't know <> is the standard input, UNTIL this very moment. Thanks for that. There's one thing: on Windows XP system, I've never been used to running a program in the command line.

Mike 2009-11-22 01:44:13

@Mike Hey, you're a programmer: get your bad self on the command line! :) Seriously, running from the command line is the way to go, especially when you're learning. Also, I can't emphasize enough how helpful it is to build your programs (whenever possible) using this fundamental idea: read from standard input; write to standard output. Several years ago, I was in your shoes, and my programming improved a lot when I started following that approach, because it allows you to build tools that can be connected in flexible ways. See here for similar ideas: http://www.faqs.org/docs/artu/.

FM 2009-11-22 13:06:04

@Mike If you don't have time to read the whole think, at least check out Ch 1: http://www.faqs.org/docs/artu/ch01s06.html. Very useful.

FM 2009-11-22 13:09:47

@FM, thanks for sharing the thoughts :) I really appreciate it! And I've just finished reading the chapter 1 but I think I need a little more time to think, or meditate, and honestly, I never seriously thought myself to be a programmer. I'm still too newbie. But I'll take it as an encouragement and I know you are probably right. Yes, I agree that "traditions exist for a good reason: to tame the learning."

Mike 2009-11-22 14:31:54

Answer 5

A:

molecules 2009-11-21 13:07:34

Thanks to Mike and brian for introducing me to `..` as a flip-flop operator. You can also use `^..^` to exclude the matching lines, `^..` to exclude the first matching line, or `..^` to exclude the last matching line. (I'm not sure where else to find this info, but Larry Wall mentioned it at http://www.nntp.perl.org/group/perl.perl6.language/2005/11/msg24098.html. The flip-flop operators in Perl6 will be `ff` and `fff`, like `..` and `...` in Perl 5. Compare brian's and Mike's succinct answers using the flip-flop operator to my verbose, though correct, answer above.

molecules 2009-11-23 15:43:16

Answer 6

+3 A:

From perlfaq6's answer to How can I pull out lines between two patterns that are themselves on different lines?

You can use Perl's somewhat exotic .. operator (documented in perlop):

perl -ne 'print if /START/ .. /END/' file1 file2 ...

If you wanted text and not lines, you would use

perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...

But if you want nested occurrences of START through END, you'll run up against the problem described in the question in this section on matching balanced text.

Here's another example of using ..:

while (<>) {
 $in_header =   1  .. /^$/;
 $in_body   = /^$/ .. eof;
# now choose between them
} continue {
 $. = 0 if eof; # fix $.
}

brian d foy 2009-11-22 22:57:39

ansaurus

tags:

views:

answers:

How can I read all of the lines between two lines in a file, using Perl?

related questions