[Original answer as of 2008-11-27 down to "Since the question"; the analysis from there on is new as of 2008-11-29.]
Quickest - not sure. This works, though it is not pretty:
#!/bin/perl -w
use strict;
my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");
my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");
sub value_in
{
my($value, @array) = @_;
foreach my $element (@array)
{
return 1 if $value eq $element;
}
return 0;
}
@mylist2 = (@mylist2, grep { ! value_in($_, @mylist2) } @mylist1);
print sort @mylist2, "\n";
This avoids converting the arrays into hashes - but for large arrays, the value_in
sub may be slow.
Since the question was "what is the quickest method", I did some benchmarking. To my none-too-vast surprise, my method was slowest. Somewhat to my surprise, the fastest method was not from List::MoreUtils. Here's the test code and the results - using a modified version of my original proposal.
#!/bin/perl -w
use strict;
use List::MoreUtils qw(uniq);
use Benchmark::Timer;
my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");
my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");
sub value_in
{
my($value) = shift @_;
return grep { $value eq $_ } @_;
}
my @mylist3;
my @mylist4;
my @mylist5;
my @mylist6;
my $t = Benchmark::Timer->new(skip=>1);
my $iterations = 10000;
for my $i (1..$iterations)
{
$t->start('JLv2');
@mylist3 = (@mylist2, grep { ! value_in($_, @mylist2) } @mylist1);
$t->stop('JLv2');
}
print $t->report('JLv2');
for my $i (1..$iterations)
{
$t->start('LMU');
@mylist4 = uniq( @mylist1, @mylist2 );
$t->stop('LMU');
}
print $t->report('LMU');
for my $i (1..$iterations)
{
@mylist5 = @mylist2;
$t->start('HV1');
my %k;
map { $k{$_} = 1 } @mylist5;
push(@mylist5, grep { !exists $k{$_} } @mylist1);
$t->stop('HV1');
}
print $t->report('HV1');
for my $i (1..$iterations)
{
$t->start('HV2');
my %k;
map { $k{$_} = 1 } @mylist1;
map { $k{$_} = 1 } @mylist2;
@mylist6 = keys %k;
$t->stop('HV2');
}
print $t->report('HV2');
print sort(@mylist3), "\n";
print sort(@mylist4), "\n";
print sort(@mylist5), "\n";
print sort(@mylist6), "\n";
Black JL: perl xxx.pl
9999 trials of JLv2 (1.298s total), 129us/trial
9999 trials of LMU (968.176ms total), 96us/trial
9999 trials of HV1 (516.799ms total), 51us/trial
9999 trials of HV2 (768.073ms total), 76us/trial
ABCDE
ABCDE
ABCDE
ABCDE
Black JL:
This is Perl 5.10.0 compiled for 32-bit SPARC with multiplicity on an antique Sun E450 running Solaris 10.
I believe that the test setups are fair; they all generate their answer into a new array, separate from mylist1 and mylist2 (so mylist1 and mylist2 can be reused for the next test). The answer designated HV1 (hash values 1) has the timing start after the assignment to @mylist5, which I think is correct. However, when I did the timing with the start before the assignment, it was still quickest:
Black JL: perl xxx.pl
9999 trials of JLv2 (1.293s total), 129us/trial
9999 trials of LMU (938.504ms total), 93us/trial
9999 trials of HV1 (505.998ms total), 50us/trial
9999 trials of HV2 (756.722ms total), 75us/trial
ABCDE
ABCDE
ABCDE
ABCDE
9999 trials of HV1A (655.582ms total), 65us/trial
Black JL: