tags:

views:

80

answers:

5

Assuming file.txt has just one sentence per line as follows:

John Depp is a great guy.  
He is very inteligent.  
He can do anything.  
Come and meet John Depp.

The Perl code is as follows:-

open ( FILE, "file.txt" ) || die "can't open file!";
@lines = <FILE>;
close (FILE);
$string = "John Depp";
foreach $line (@lines) {
    if ($line =~ $string) { print "$line"; }
}

The output is going to be first and fourth line.

I want to make it working for the file having random line breaks rather than one English sentence per line. I mean it should also work for the following:-

John Depp is a great guy. He is very intelligent. He can do anything. Come and meet John Depp.

The output should be first and fourth sentences.

Any ideas please?

A: 

one way

while(<>){
 if (/John Depp/i){
   @s = split /\s*\.\s*/;
   foreach my $line (@s){
      @f=split /\s*\.\s*/ , $line;
      foreach my $found (@f){
        if ($found =~/John Depp/i) {
           print $found."\n";
        }
      }
   }
 }
}

output

$ cat file
John Depp is a great guy.
He is very inteligent.
He can do anything.
Come and meet John Depp.
John Depp is a great guy. He is very inteligent. He can do anything. Come and meet John Depp.

$ perl perl.pl file
John Depp is a great guy
Come and meet John Depp
John Depp is a great guy
Come and meet John Depp
ghostdog74
What if you have `John\nDepp eats with scissors.` in your file?
daotoad
there are a lot of what if's right? don't ask me, ask the OP.
ghostdog74
+1  A: 

First, note that the name of the famous actor is Johnny Depp.

Second, figuring out what is a sentence and what is not is tricky. I am going to cheat and use Lingua::Sentence:

#!/usr/bin/perl

use strict; use warnings;

use Lingua::Sentence;

my $splitter = Lingua::Sentence->new('en');

while ( my $text = <DATA> ) {
    for my $sentence ( split /\n/, $splitter->split($text) ) {
        print $sentence, "\n" if $sentence =~ /John Depp/;
    }
}

__DATA__
John Depp is a great guy.
He is very intelligent.
He can do anything.
Come and meet John Depp.
John Depp is a great guy. He is very intelligent. He can do anything. Come and meet John Depp.

Output:

John Depp is a great guy.
Come and meet John Depp.
John Depp is a great guy.
Come and meet John Depp.
Sinan Ünür
A: 

More simple: if you assume "sentences" are separated by dots, then you can use that as field separator:

 $/ = '.';
 while(<>) {
        print if (/John Depp/i);
 }
leonbloy
That works for the simple example but fails if one of the sentences refers to *Mr.* John Depp.
Sinan Ünür
Of course. But, what do you expect the ideal perl code to output given this input?"John Depp is a great guy. He can do anything. Come and meet Mr. John Depp. He is very intelligent. " ?Do you expect perl to "know" that "Come and meet Mr. John Depp." is one sentence ? How?
leonbloy
That is, I was assumming that the author were looking just for a regex solution (tagged 'regex') in which case one must do some simple assumption about how sentences are delimited. If this is not enough (in real english it is not), one must resort to some library who knows the intrincancies of english written language - as you have pointed out (Lingua::Sentence)
leonbloy
A: 

Default variables can get clobbered if one isn't careful. So naming everything is a good idea.

This should get you started:

#!/usr/bin/perl -w

use strict;

my $targetString = "John Depp";

while (my $line = <STDIN>) {
    chomp($line);
    my @elements = split("\\.", $line);
    foreach my $element (@elements) {
        if ($element =~ m/$targetString/is) {
            print trim($element).".\n";
        }
    }
}

sub trim {
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
}

Usage:

$ depp.pl < file
John Depp is a great guy.
Come and meet John Depp.
John Depp is a great guy.
Come and meet John Depp.
Alex Reynolds
A: 

Looking at your original code rather than specifically answering your question. It is generally a bad idea to read a whole file into memory unless you have to. You can process a file line by line as

open ( FILE, "file.txt" ) || die "can't open file!";
$string = "John Depp";
while (<FILE>) {
   if (/$string/) { print }
}
justintime