OK, I'm without access at the moment so this is not guaranteed to be 100% or even compile but should provide enough guidance:
Solution 1: (word order does not matter)
#!/usr/bin/perl -w
use strict;
use File::Slurp;
my @B_lines = File::Slurp::read_file("B") || die "Error reading B: $!";
my %B_words = ();
foreach my $line (@B_lines) {
map { $B_words{$_} = 1 } split(/\s+/, $line);
}
my @A_lines = File::Slurp::read_file("A") || die "Error reading A: $!";
my @new_lines = ();
foreach my $line (@A_lines) {
my @B_words_only = grep { $B_words{$_} } split(/\s+/, $line);
push @new_lines, join(" ", @B_words_only) . "\n";
}
File::Slurp::write_file("A_new", @new_lines) || die "Error writing A_new: $!";
This should create a new file "A_new" that only contains A's words that are in in B.
This has a slight bug - it will replace any multiple-whitespace in file A with a single space, so
word1 word2 word3
will become
word1 word2 word3
It can be fixed but would be really annoying to do so, so I didn't bother unless you will absolutely require that whitespace be preserved 100% correctly
Solution 2: (word order matters BUT you can print words from file A out with no regards for preserving whitespace at all)
#!/usr/bin/perl -w
use strict;
use File::Slurp;
my @A_words = split(/\s+/gs, File::Slurp::read_file("A") || die "Error reading A:$!");
my @B_words = split(/\s+/gs, File::Slurp::read_file("B") || die "Error reading B:$!");
my $B_counter = 0;
for (my $A_counter = 0; $A_counter < scalar(@A_words); ++$A_counter) {
while ($B_counter < scalar(@B_words)
&& $B_words[$B_counter] ne $A_words[$A_counter]) {++$B_counter;}
last if $B_counter == scalar(@B_words);
print "$A_words[$A_counter]";
}
Solution 3 (why do we need Perl again? :) )
You can do this trivially in shell without Perl (or via system() call or backticks in parent Perl script)
comm -12 A B | tr "\012" " "
To call this from Perl:
my $new_text = `comm -12 A B | tr "\012" " " `;
But see my last comment why this may be considered "bad Perl"... at least if you do this in a loop with very many files being iterated and care about performance.