Edit:
My original title has been sort of changed. I suspect the current title does not reveal my original purpose: let perl automatically use the contents of one file as the source of search keywords to search another file and then output the matches to a third file. This means without this kind of automation, I would have to manually type those query terms that are listed in FILE1 one by one and get matches from FILE2 one at a time by simply writing something like "while(<FILE2>){if (/query terms/){print FILE3 $_}}
".
To be more specific, FILE1 should look something like this:
azure Byzantine cystitis dyspeptic eyrie fuzz
FILE2 might (or might not) look something like this:
azalea n. flowering shrub of the rhododendron family azure adj. bright blue, as of the sky byte n. fixed number of binary digits, often representing a single character Byzantine adj. of Byzantium or the E Roman Empire cystitis n. inflammation of the bladder Czech adj. of the Czech Republic or Bohemia dyslexic adj. suffering from dyslexia dyspeptic adj. suffering from dyspepsia eyelet n. small hole in cloth, in a sail, etc for a rope, etc to go through; eyrie n. eagle's nest fuzz n. mass of soft light particle fuzzy adj. like fuzz
FILE3 should look something like this if FILE2 is the way it is like above:
azure adj. bright blue, as of the sky Byzantine adj. of Byzantium or the E Roman Empire cystitis n. inflammation of the bladder dyspeptic adj. suffering from dyspepsia eyrie n. eagle's nest fuzz n. mass of soft light particle
It took me hours of trial and error to finally figure out a seemingly working solution but my code is probably buggy, not to mention inefficient. I hope you guys can send me on the right track if I'm wrong, kindly offer me some guidance and share with me some different approaches to the problem if any (Well, there must be). As suggested by daotoad, I'm trying to comment out what each line of code does. Please correct me if I misunderstand something. Thanks :)
#!perl #for Windows, simply perl suffices. I'm reading *Learning Perl*.
use warnings; #very annoying I've always been receiving floods of error messages
use strict; #I often have to look here and there because of my carelessness
open my $dic,'<', 'c:/FILE2.txt' or die "Cannot open dic.txt ;$!"; # 3-argument version of open statement helps avoid possible confusion; Dunno why when I replace dic.txt with $dic in the death note, I'll receive "needs explicit package name" warning. Any ideas?
open my $filter,'<','c:/FILE1.txt' or die "Cannot open new_word.txt :$!";
my @filter=<$filter>; #store the entire contents of FILE1 into @filter.
close $filter; #FILE1 is useless so close the connection between FILE1 and perl
open my $learn,'>','c:/FILE3.txt'; #This file is where I output matching lines.
my $candidate=""; #initialize the candidate to empty string. It will be used to store matching lines. Learnt this from Jeff.
while(<$dic>){ #let perl read the contents of FILE2 line by line.
for (my $n=0; $n<=$#filter; $n++){ #let perl go through each line of FILE1 too
my $entry = $filter[$n];
chomp($entry); #Figured out this line must be added after many fruitless attempts
if (/^$entry\s/){ #let perl compare each line of FILE2 with any line of FILE1.
$candidate.= $_ ; } #every time a match is found, store the line into $candidate
}
}
print $learn $candidate; #output the results to FILE3
UPGRADE1:
Thank you very much for the guidance! I truly appreciate it :)
I believe I'm now going in a somewhat different direction as I originally intended. The concept of hashes was beyond the then stock of my Perl knowledge. Having finished the hashes section of learning Perl, I'm now thinking: although the use of hashes may effiently solve the example problem I posted above, situations might get complicated if the headwords (not the whole entry) in the definition file (FILE2) have duplicates. But on the other hand, I see hashes are very important in programming in Perl. So this morning I tried to implement @mobrule's idea: load the contents of FILE1 into a hash and then check whether the first word of each line in FILE2 was in your hash table.. But then I decided I should load FILE2 into a hash instead of FILE1 because FILE2 contains dictionary entries and it is meaningful to treat HEADWORDS as KEYS and DEFINITIONS as VALUES. Now I came up with the following code. It seems close to success.
#!perl
open my $learn,'>','c:/file3.txt' or die "Cannot open Study Note;$!";
open my $dic,"<",'c:/file2.txt' or die "Cannot open Dictionary: $!";
my %hash = map {split/\t+/} <$dic>; # #I did some googling on how to load a file into a hash and found this works. But actually I don't quite understand why. I figured the pattern out by myself. /\t+/ seems to be working because the headwords and the main entries in FILE2 are separated by tabs.
open my $filter,'<','c:/file1.txt' or die "Cannot open Glossary: $!";
while($line=<$filter>){
chomp ($line);
if (exists $hash{$line}){
print "$learn $hash{$line}"; # this line is buggy. first it won't output to FILE3. second, it only prints the values of the hash but I want to include the keys.
}
}
The code output the following results on screen:
GLOB(0x285ef8) adj. bright blue, as of the sky GLOB(0x285ef8) adj. of Byzantium or the E Roman Empire GLOB(0x285ef8) n. inflammation of the bladder GLOB(0x285ef8) adj. suffering from dyspepsia GLOB(0x285ef8) n. eagle's nest GLOB(0x285ef8) n. mass of soft light particle
UPGRADE2
One problem solved. I can print both keys and values now by doing a minor modification of the last line.
print "$learn $line: $hash{$line}";
UPGRADE3
Haha: I made it! I made it :) modified the code again and now it outputs stuff to FILE3!
#!perl
open my $learn,'>','c:/file3.txt' or die $!;
open my $dic,"<",'c:/file2.txt' or die $!;
my %hash = map {split/\t+/} <$dic>; #the /\t+/ pattern works because the entries in my FILE2 are separated into the headwords and the definition by two tab spaces.
open my $filter,'<','c:/file1.txt' or die $!;
while($line=<$filter>){
chomp ($line);
if (exists $hash{$line}){
print $learn "$line: $hash{$line}";
}
}
UPGRADE4
I'm thinking if my FILE2 has totally different contents, say, sentences that contain query words in FILE1, it will be difficult, if not impossible, for us to use the hash approach, right?
UPGRADE5
Having carefully read the perlfunc page about the split operator, now I know how to improve my code :)
#!perl
open my $learn,'>','c:/file3.txt' or die $!;
open my $dic,"<",'c:/file2.txt' or die $!;
my %hash = map {split/\s+/,$_,2} <$dic>; # sets the limit of separate fields to 2
open my $filter,'<','c:/file1.txt' or die $!;
while($line=<$filter>){
chomp ($line);
if (exists $hash{$line}){
print $learn "$line: $hash{$line}";
}
}