tags:

views:

218

answers:

4

I've racked my brain trying to come with a solution but in vain. Any guidance would be appreciated.

_data_
mascot
friend
ocean
\n
parsimon
**QUERY**
apple
\n
jujube
\n
apricot
maple
**QUERY**
rose
mahonia
\n

....Given the search keyword is QUERY, it would output:

parsimon
**QUERY**
apple

apricot
maple
**QUERY**
rose
mahonia

I wrote a code that doesn't work as I would like:

#!/usr/bin/perl

use strict; 
use warnings;

open my $fh, '<', 'FILE' or die "Cannot open: $!";
my @file = <$fh>;
close $fh;

for (0 .. $#file) {   # read from the first line to the last
  if($file[$_] =~ /QUERY/){  # if the contents of a particular line matches the query pattern
        my $start = $_-- until $file[$_--] =~ /^$/; #check the previous line for an empty line. continue until success. store the index of the empty line to $start.
        my $end = $_++ until $file[$_++] =~ /^$/; #check the next line for an empty line. continue until sucess. store the index of the empty line to $end.

print "\n @file[$start..$end]"; #print all lines between the stored indexes
}
}

I also tried something like this but there was syntactic error:

if($file[$_] =~ /QUERY/){
        my $start = $_-4 if $file[$_-4] =~ /^$/;
      continue  my $start = $_-3 if $file[$_-3]=~/^$/;
  ------
my $end = $_+4 until $file[$_+4] =~ /^$/;
.....

print "\n @file[$start..$end]";
}
.....

Seems that the only good thing that I've so far succeeded in achieving is I can print everything between the matching lines and next empty lines using the following code:

for (0 .. $#file) {
  if($file[$_+1] =~ /QUERY/) {
   print $file[$_] until $file[$_++]=~/^$/;

Can someone point me in the right direction? Thanks!

Mike

Edit

I think brian d foy's solution to my problem is the best. By best I mean the most efficient. But Jeff's solution is the most helpfult and I benefit a lot especially from his detailed line-by-line explanations and what's even better, using his code, with only a few tweaks, I can do something else, for example, print all lines between the lines starting with a digit when a pattern is found. And Kinopiko's code is the very code I was hoping to be able to write.

+1  A: 

There are some "howlers" here.

  • You are modifying $_ inside a loop which uses $_ as its loop variable.
  • You are doubly decrementing and also doubly incrementing $_.
  • You don't seem to be dealing with the top and bottom of the file.

This seems to work:

my @file = <DATA>

for (0 .. $#file) {
    if ($file[$_] =~ /QUERY/){
 my $start = $_;
 while ($start >= 0 && $file[$start] !~ /^$/) {
     $start--;
 }
        my $end = $_;
 while ($end <= $#file && $file[$end] !~ /^$/) {
     $end++;
 }
 print "\n@file[$start+1..$end-1]";
    }
}
__DATA__
mascot
friend
ocean 
parsimon
QUERY
apple 

jujube 
apricot
maple
QUERY
rose
mahonia
Kinopiko
The script prints "Found: 3 7<parsimon QUERY apple>Found: 9 15<apricot maple QUERY rose mahonia>"
Mike
Thanks, but can we get rid of "Found 3 7" and "Found: 9 15"?
Mike
And thanks for pointing out my silly mistakes. Hopefully I'm improving.
Mike
@Mike: If you realize that you aren't as wise today as you thought you were yesterday, you're wiser today! :)
Ether
It runs well! Thanks Kinopiko! This is the very code I intended to be able to write but I only had an empty idea and YOU are the doer. Thanks!
Mike
And Thanks Ether :)
Mike
My goal was to show Mike how to do what he was trying to do. Since there were some "howlers" in the original code, I thought it would be useful to show how to implement these ideas and avoid the howlers.
Kinopiko
+2  A: 

This results in the output you specified, assuming you mean "if QUERY is found, return everything from the previous blank line to the next". This will fail in the last case if there is no blank line at the end of the file, but you could simply check the last candidate after the while loop.

#!/usr/bin/perl

use strict;
use warnings;

# Open the file
open my $fh, '<', 'FILE' or die "Cannot open: $!";

# initialize the candidate to an empty string
# This is good form, so we know that we are starting with an empty candidate.
my $candidate = '';

# Go through each line of the file
# The contents of the line will be placed in $_
while(<$fh>) {

    # If the line is blank, check the candidate we have accumulated
    # This is shorthand for 'if($_ =~ /^$/) {'
    if(/^$/) {

        # If the candidate contains your QUERY, print it out
        # Alternately, you could add it to an array, etc
     if($candidate =~ /\nQUERY\n/) {
            print "$candidate\n";
        }

        # We are done with the previous candidate, so clear it out
        $candidate = '';
    } else {

        # If it is not a blank line, concatenate it to your
        # candidate.  I.e. build up all of the lines between blanks
        $candidate .= $_;
    }
}

# This handles the final lines, if there is not a blank line
# at the end of the file
if($candidate =~ /\nQUERY\n/) {
    print "$candidate\n";
}
Jeff B
+1 That's better than my answer. I tried to adapt the method used by Mike so that it worked, but you did it better and simpler.
Kinopiko
But this code stops running after the first match. It only prints "parsimo QUERY apple", no "apricotmaple**QUERY**rosemahonia"
Mike
If you read my explanation, it is because there is no blank line at the end of the file. Either add the blank line, or add the if($candidate)... statement after the while loop
Jeff B
@Mike: I added the extra 'if' statement to catch the final case if there is no blank line.
Jeff B
Works perfectly. Thanks Jeff:)
Mike
But if you'll indulge me, can you please provide more comments? I've just started learning Perl. Still in the process of familiarizing myself with basic stuff.
Mike
For example, what job does "my $candidate = ''" do?
Mike
Comments added.
Jeff B
Ah, that's great!! Really appreciate it, dude :)
Mike
This is a clever code.
Mike
The solution is much simpler than this. You're working too hard and using too many clever tricks.
brian d foy
Thank you too brian. I was impressed by Jeff's solution first and now I'm even more impressed by yours. But both of you guys are great. I appreciate your help very much :)
Mike
Too many clever tricks? Working too hard? I am simply coding a solution that works. That's the second time you have made that comment about my code. I understand that you understand perl in great depth, but some more specific constructive criticism might be nice. Your solution is great, but it seems more "tricky" than mine, at least to a beginner. Setting $/ is not something that ever crossed my mind.
Jeff B
Well, you did a lot of work and I didn't. I didn't do anything fancy. Setting $/ is a common Perl idiom.
brian d foy
No harsh feelings, please. Honestly, my impression is: Jeff's code is clever but brian's fancy and by fancy, I mean something out of my expectation. But the point is both of your codes work great.
Mike
+6  A: 

Wow, you guys really like doing a lot of work in those answers. Remember, in text processing, Perl makes the easy things easy (and the hard things possible). If you're doing a lot of work for something that's easy to explain, you're probably missing the easy way. :)

Just redefine a line to be a paragraph and print the matching paragraphs as you read them. You can change Perl's idea of a line by setting the input record separator, $/, to be the line-ending that you want. When you use the line input operator, you'll get back everything up to and including what is in $/. See perlvar for the details on Perl special variables:

#!perl

{
    local $/ = "\n\n";

    while( my $group = <DATA> ) {
        print $group if $group =~ /\Q**QUERY**/;
    }
 }


__DATA__
mascot
friend
ocean

parsimon
**QUERY**
apple

jujube

apricot
maple
**QUERY**
rose
mahonia

ghostdog74 posted his one-liner version, which I modified slightly:

perl -ne "$/=qq(\n\n); print if /\Q**QUERY**/" fileA fileB ...

perl has a special command-line switch for this, though. You set the input record separator with -0, and if you set it to 0, it means you're setting it to use the paragraph mode:

perl -00 -ne "print if /\Q**QUERY**/" fileA fileB ...

The perlrun shows you all the nifty things you can do on the command line.

brian d foy
Tested okay. Seems this is another clever code! Really impressed. And thanks alot for opening up my eyes.
Mike
Is it that the first line "local $/ = "\n\n";" puts everything between empty lines into a group?
Mike
Yep. The $/ is the input record separator. Whatever it is becomes the thing that separates lines. The trick is to not think about lines as you see them, but how a computer can see them. :)
brian d foy
brian, thanks for adding more explanation to the code and I fully agree that if you're doing a lot of work for something that's easy to explain, you're probably missing the easy way.
Mike
I've finally figured out this one-liner thingy: for Windows XP, single quotes have to be replaced with double quotes :) At last!
Mike
"you guys really like doing a lot of work in those answers" - no, my answer was meant to show the "howlers" in the original, not be the shortest possible solution.
Kinopiko
@Kinopiko: I didn't say anything about your solution. Feeling a bit guilty, perhaps :)
brian d foy
+2  A: 
# more file
mascot
friend
ocean

parsimon
**QUERY**
apple

jujube

apricot
maple
**QUERY**
rose
mahonia

# perl -ne '$/ = "\n\n";print $_ if /QUERY/' file
    parsimon
    **QUERY**
    apple

    apricot
    maple
    **QUERY**
    rose
    mahonia
ghostdog74