tags:

views:

186

answers:

3

I have a Perl script, that's supposed to match this string:

Sometimes, he says "hey fred, what's up?"

It says if it found fred at the beginning, end, or middle of the word, or if it just found "fred". So it matches Alfred, and Frederich.

Well, in this string, it's supposed to say it found fred on its own, but it's saying it found it at the beginning of a word. Here is the regex for the beginning-of-word-fred, (it's in an if-elsif ladder going beginning of word, end of word, just fred, middle of word):

if(/.*\s+[fF][rR][eE][dD][^ \t\r\n,.:;'"].*/){
    print "found fred at beginning of a word:\n    $_\n";

I used [^ \t\r\n,.:;'"] instead of \S incase the word is followed by some punctuation. Obviously it's not an exhaustive list of punctuation, but it doesn't matter for this example since it's followed by a comma.

this is in a foreach loop... If it means anything, This is exercise 7-1 in Learning Perl 5th ed.

update

the exercise in the book is to write a Perl program to find "fred" in a list of words. Then it asks, does the script find fred in "Frederich" or "Alfred?" And then it says to write a text file that talks about Fred Flinstone and his friends, and use it as an input to the script.

also

I figured it out, sort of: I must have changed something while writing the question that I forgot about: I tested it again and instead of matching the beginning of a word, it just said it found it anywhere. So the problems wasn't that it thought it was at the beginning of a word, it was that it thought it wasn't the only thing in the word. I added [,.:;'"]?\s+ to the code which matches "fred" as a whole word and it worked. I guess I should have thought about it a little more before asking :)

+2  A: 

Are you sure it doesn't work? It looks fine for your example case, and a slightly adjusted version of your code that I just ran gave the expected answer:

#!/usr/bin/perl

use strict; use warnings;

my $st = q{Sometimes, he says "hey fred, what's up?"};

foreach($st)
{
    if(/.*\s+[fF][rR][eE][dD][^ \t\r\n,.:;'"].*/){
        print "found fred at beginning of a word:\n    $_\n";
    }
    else
    {
     print "not found in $_";
    }
}

is reporting the 'not found' part (as expected, since I'm not doing the 'just fred' check).

Cebjyre
+9  A: 

You can use \b for word boundaries and \w for word characters and also, the /i modifier for case insensitivity is cleaner than using [fF] etc.

Something like:

if ($st =~ m{\b fred \w+ }xi) {
    print "Found fred at the beginning of a word";
} else {
    print "Not found";
}

If you need to look for 'fred' as a word itself, then use \b fred \b.

I'd recommend having a read of http://perldoc.perl.org/perlre.html

David Precious
fred's your uncle
ysth
It's probably worth pointing out to the original asker that the spaces in this regex are essentially ignored due to the x flag
Cebjyre
Good point Cebjyre. The /x modifier causes whitespace within the regular expression to be ignored, meaning you can use whitespace to make it more readable On long regexes, this can be very valuable, reducing the similarity with line noise. This is one of the many recommendations from Damian Conway's Perl Best Practices book which I agree with.
David Precious
+1  A: 

if you want to match Fred and frederick but not Alfred, then your regex is:

/\bfred\w*\b/i

That is to say: a word boundary followed by (case-insentitive) "fred" followed by zero or more word-characters, followed by another word boundary. If you just want frederick, but plain Fred is out, then:

/\bfred\w+\b/i

i.e., word boundary, "fred", one or more word chars, word boundary.

UPDATE: re-reading your question, it seems that you want:

perl -E '
use strict;
use warnings;
for( "nobody is here",
    "I am Frederick Flintsone",
    "she is alfredine",
    "I am Alfred Hitchcock",
    "fred has left the building" ) {
  say;
  if( ! /\b(\w*)fred(\w*)\b/i ) {
    say "no fred!"
  } elsif( ! length "$1$2" ) {
    say "fred by itself!"
  } elsif( ! length $2 ) {
    say "something-fred!"
  } elsif( ! length $1 ) {
    say "fred-something!"
  } else {
    say "something-fred-something!"
  }
}'

that outputs:

nobody is here
no fred!
I am Frederick Flintsone
fred-something!
she is alfredine
something-fred-something!
I am Alfred Hitchcock
something-fred!
fred has left the building
fred by itself!
Massa
Well, if you just wanted to match and not capture \bfred\B works well enough.
Axeman
yeah, one could see as /\Bfred\B/i => "something-fred-something", ... but doing four matches would be more expensive than just one if (as I suspect) the OP wants to distinguish between the cases.
Massa