tags:

views:

338

answers:

9

Thanks for the answers... i am trying the different possibilities with all your answers. one thing: i could not be that much clearer in asking question to you all, ie. i m applying this RE in my local script/character(similar to Tibetan Script) and not in English word.

foreach my $word (@list)
{
  if(grep(/$word/, $dict))       # i have dict in scalar ($dict)
       {
           print "Matched and Found\n";
        }
    else
      {
         print "Not Matched\n";
      }
}

Focus is to extract on single matched, exact word. i tried using /\b$word\b/... that doesnot seem to work in our script... where our word is made up of multisyllable and each syllable is separated by (.)(Tibetan Tsheg).

Additional information:

For the beginner the most challenging feature of the Tibetan sentence is the lack of separation between words. ... Since there is no space after a word, the reader must figure out each word based on context and location in the sentence. Looking up these two letters in a dictionary might lead you to think that this sentence is starting with a reference to the surface of earth. However, the rest of the sentence, its context, and the lack of an agentive case connector, indicates that these two letters are not words by themselves, but rather the word "yesterday". From this you can see it's good to first evaluate a sentence as a whole, by identifying it's various elements, rather than translate it word by word.

Emphasis added. See http://www.learntibetan.net/grammar/sentence.htm

A: 

This one is easy: nothing is wrong. I can run that code just fine in Perl, and it works as intended. The problem must be somewhere else. Are you using "use strict;" at the top of the file?

reinierpost
+1  A: 

I'm fond of

grep { $_ =~ /blah/} @foo

That lets me modify the condition later easier than a straight

grep(/blah/, @foo)

But I don't see anything wrong with your syntax.

Paul Nathan
Why do you think that let's you edit the condition easier? I prefer the block form, but it's only easier when you want to have multiple statements.
brian d foy
I can do do something like changing it to an `eq` operator, or calling a complex match-type function instead. It's simpler if I just go with the general block operator.
Paul Nathan
A: 

There is nothing wrong with your syntax. It's just not very Perlish. In fact, your code says "Hi there, I have a C background!". Thus, for a start, I'd get rid of the parens after grep.

But what really needs some more thought is your regular expression. What if @list contained 'sex', but @dict contained 'Essex'? I'd change that regular expression to:

m/^$word$/i
innaM
better just `$_ eq $word`.
hobbs
That would be the next logical step.
innaM
yeah... thats the problem with me... i just want 'sex' and it show all the words that contain 'sex'. i have a list of word and each of those word are searched in $wordlist.... thanks for answers...
Cthar
+2  A: 

Instead of writing your own code to compare every element of @list against every element of @dict, use a module that already does the job for you, like List::Compare:

use strict;
use warnings;
use List::Compare;

my @dict = qw(apple banana orange grape pomegranate);
my @list = qw(banana giraffe pomegranate apple);

my $lc = List::Compare->new(\@dict, \@list);
my @intersection = $lc->get_intersection;

print "words found in the dictionary: " . join(', ', @intersection) . "\n";
Ether
Why the downvote?
Ether
+1  A: 

I would use List::Util::first for that. It stops processing the list after the first answer. grep won't do that.

if( defined first { /$word/ } @list ) {
    print "Matched and Found\n";
}
else {
    print "Not Matched\n";
}
Axeman
+1  A: 

Your grep syntax is fine.

I feel compelled to comment on your algorithm, though. It is very wasteful.

You iterate over @dict once for every word in @list.

It would be faster to assign one array into the keys of a hash and do lookups on the hash:

my %lut;
@lut{@list} = ();

for my $word ( @dict ) {
    print exists $lut{$word} ? "Matched and Found\n" : "Not Matched\n";
}

Hash lookups happen in constant time and so instead of a nested loop, you have a flat loop. As your word lists grow, speed differences should become quite apparent.

daotoad
A: 

In Perl 5.10, we have smart matching!

foreach my $word (@list) {
  say $word ~~ @dict ? 'Matched and Found' : 'Not Matched';
}
oylenshpeegul
+2  A: 

Keeping the dictionary in a string and using grep to search it is going to be very slow for a dictionary of any size. Have you considered using a hash for the dictionary? I.e.

$dict = { word1 => 1, word2 => 1....... etc } # for example...

for my $word (@list) 
{ 
   if ($dict->{$word})
   {
      print "Matched\n";
   }
   else
   {
      print "Not matched\n";
   }
}

Note that I don't advocate creating the hash in this manner, this is just an example to show using a hash as a dictionary, with the keys being the words and the values a constant 'true' value. If the matching has to be case-insensitive you would lowercase the dictionary words before inserting them into the hash, and lowercase $word before doing the lookup.

EDIT: Here's some code to load the dictionary from a file with one word per line

open(FH,'dictionary.txt');
$dict = { map {chomp; $_,1} <FH> }
close(FH)

Explanation:

  1. <FH> in list context reads the entire file
  2. The map function evaluates the block ( the stuff in braces) for each line
  3. The block removes the newline and returns a two-element list containing the word and '1'
  4. The entire returned list is used to initialize a hash
  5. A reference to the hash is stored in $dict
Jim Garrison
Any reason you're using `$dict` is a hash reference instead of `%dict` as a hash? I'm not sure it matters (other than saving you the trouble of typing `->`). Just curious.
Nathan Fellman
Just force of habit. My background is Java, where everything is a reference, so I'm more comfortable looking at things that way :-)
Jim Garrison
i have loaded the dict(wordlist) from external file and slurp into $dict... how is external loaded file assigned to hash... Sorry, question may seem nonsense... since i being a perl newbie...thanks for the answer.
Cthar
@Cthar How can we figure out how to load a file into a hash if we have no idea how the file is formatted?
Sinan Ünür
A: 

I know nothing about Tibetan script. The example below assumes your dictionary consists of words followed by an equal sign and the definition of the word on each line.

It uses File::Slurp to efficiently slurp the file as list of lines, chomps each line and splits it to get the word as the key and the definition as the value in the %dict hash.

It assumes that the @words already contains individual words and words do not need to be identified from an arbitrary text such as "a.a.b.a.b.b.a.a.b.a" (see my remark pointing out that words are not separated in Tibetan, only syllables are).

To modify the code to read the dictionary from an external file, replace \*DATA with the name of the file.

#!/usr/bin/perl

use strict;
use warnings;

use File::Slurp;

my @words = qw( a b a.b b.a a.a b.a.b);

my %dict = map { chomp; split /\s*=\s*/ } read_file \*DATA;

for my $word ( @words ) {
    if ( defined(my $defn = $dict{$word}) ) {
        print "'$word' means $defn\n";
    }
    else {
        print "'$word' not found\n";
    }
}

__DATA__
a = Letter 1
b = Letter 2
a.b = Letter 1 and Letter 2
b.a = Letter 2 and Letter 1
a.b.a = Letter 1 and Letter 2 and Letter 1
b.a.b = Letter 2 and Letter 1 and Letter 2

Output:

'a' means Letter 1
'b' means Letter 2
'a.b' means Letter 1 and Letter 2
'b.a' means Letter 2 and Letter 1
'a.a' not found
'b.a.b' means Letter 2 and Letter 1 and Letter 2
Sinan Ünür