tags:

views:

152

answers:

5

I want to find a word which is only three letter and starts with t and ends with e. Is there any other way, apart from what i have done:

open (FH, "alice.txt");
@lines = <FH>;
close(FH);

foreach $words(@lines)
{
   if($words =~ m/ t.e /g)
   {
     print $words," ";
   }
}

Also I wanted to find words which are more than 3 letters long. How can I achieve that? a word can have anything, apart from whitespaces. Any word, need not start with t or end with e. Any word which is more that 3 letters.

+2  A: 

Try using \bt\w+e\b as your regex. This finds all whole words which begin with the letter 't' and end with the letter 'e', and have at least one letter or number in between. Thus "the" and "tattle" will be matched, as will "t999e".

John Feminella
I want to find words which are more than three letters.. How can i do that.
AGeek
@RBA - the "+" takes care of that for you
annakata
If you meant "words which are more than three letters that begin with 't' and end with 'e'", that's what this does. If you meant simply any word with four or more letters, that's "\b\w{4,}\b".
John Feminella
+3  A: 

Your code is fine. You may want to change the literal space to \b (word boundary).

If you want to match more than one character between t and e, use \w+ instead of ..

jrockway
I want to find words which are more than three letters.. How can i do that..
AGeek
Err.. By using \w+ instead of . like jrockway said?
innaM
+5  A: 

Well, your regex is going to fail to file words at the beginning and end of lines. That is what the \b assertion is for:

#!/use/bin/perl

use strict;
use warnings;

use Text::Wrap;

my $file = "alice.txt";

open my $fh, "<", $file
    or die "could not open $file: $!";

my @words;
while (<$fh>) {
    push @words, /\b(t\we)\b/g;
}
print "three letter words that start with t and end with e:\n",
    wrap "\t", "\t", "@words\n";

You can find four letter words by just looking for anything that is a word character that has more than 3 characters. The \w character class matches word characters and the quantifer {4,} say to match the previous pattern 4 or more times. Put them together with the word boundary assertion and you get /\b\S{4,}\b/:

#!/use/bin/perl

use strict;
use warnings;

use Text::Wrap;

my $file = "alice.txt";

open my $fh, "<", $file
    or die "could not open $file: $!";

my @three;
my @four;
while (<$fh>) {
    push @three, /\b(t\we)\b/g;
    push @four, /\b(\w{4,})\b/g;
}
print "three letter words that start with t and end with e:\n",
    wrap("\t", "\t", "@three\n"),
    "four letter words:\n",
    wrap "\t", "\t", "@four\n";

You may want to use [[:alpha:]] instead of \w if you don't want to match things like "t0e".

Chas. Owens
This is considering only the first 'the' in a line,, not other 'the's but anyways thanx i got my answers... Thanks lot..
AGeek
There was a bug in the earliest version of the post. I was missing the `/g` option. It should find all of the three letter words now.
Chas. Owens
Thanks a lot for that update.. It really helped and workd..
AGeek
A: 

Finding words like te can be done using:

/\b(t\Se)\b/

Finding longer words (assuming the definition is: word can contain any non-blank characters):

/\b(\S{4,})\b/
depesz
+1  A: 

Although a single regex may be your solution for this particular problem, give up the idea that a single regex should do all of the checking. Sometimes it's easier to break up the conditions and handle them separately:

if( 3 == length( $word ) and $word =~ m/$regex/ ) { ... }

I think it's easier to see your intent when you write it like that. You see a contraint on the length, and a constraint on the content.

Depending on what I was doing, I might create a pipeline instead (sometimes because it's interesting to program pretending no one ever invented if()). I think this pipeline better represents how people think about the problem stepwise:

open my( $fh ), '<', 'alice.txt' or die ...;

my @matches = 
              grep { /$regex/ }     # select only matching words
              grep { 3 == length }  # select only three character lines
              map  { chomp; $_ }
              <$fh>;

The nice thing about this way of doing things is that it's easy to turn steps. You say you also want to try it with any word with more than three characters. I drop the regex filter and adjust the length filter:

my @matches = 
              grep { 3 < length }  # select only those with more than three characters
              map  { chomp; $_ }
              <$fh>;
brian d foy