tags:

views:

1804

answers:

5

Given:

my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");

my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");

What's the quickest way in Perl to insert in mylist2 all elements that are in mylist1 and not already in mylist2 (ABCDE).

+12  A: 
my %k;
map { $k{$_} = 1 } @mylist1;
map { $k{$_} = 1 } @mylist2;
@mylist2 = keys %k;

Alternatively:

my %k;
map { $k{$_} = 1 } @mylist2;
push(@mylist2, grep { !exists $k{$_} } @mylist1);

Actually - these might be wrong because they don't account for whether duplicates might exist in either of the original lists.

You didn't say in your question whether the lists are supposed to represent sets (which can't contain duplicates) or just plain lists. That you effectively want @mylist2 = @mylist1 U @mylist2 suggests that you are treating them as sets.

EDIT: changed increment to assign - saves a read of the hash value

Alnitak
This is okay if you don't need to keep the original order.
Brad Gilbert
The second option is the fastest according to my measurements - and faster than the uniq method in List::MoreUtils.
Jonathan Leffler
+1  A: 

[Original answer as of 2008-11-27 down to "Since the question"; the analysis from there on is new as of 2008-11-29.]

Quickest - not sure. This works, though it is not pretty:

#!/bin/perl -w
use strict;

my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");

my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");

sub value_in
{
    my($value, @array) = @_;
    foreach my $element (@array)
    {
        return 1 if $value eq $element;
    }
    return 0;
}

@mylist2 = (@mylist2, grep { ! value_in($_, @mylist2) } @mylist1);

print sort @mylist2, "\n";

This avoids converting the arrays into hashes - but for large arrays, the value_in sub may be slow.

Since the question was "what is the quickest method", I did some benchmarking. To my none-too-vast surprise, my method was slowest. Somewhat to my surprise, the fastest method was not from List::MoreUtils. Here's the test code and the results - using a modified version of my original proposal.

#!/bin/perl -w
use strict;
use List::MoreUtils  qw(uniq);
use Benchmark::Timer;

my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");

my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");

sub value_in
{
    my($value) = shift @_;
    return grep { $value eq $_ } @_;
}

my @mylist3;
my @mylist4;
my @mylist5;
my @mylist6;

my $t = Benchmark::Timer->new(skip=>1);
my $iterations = 10000;

for my $i (1..$iterations)
{
    $t->start('JLv2');
    @mylist3 = (@mylist2, grep { ! value_in($_, @mylist2) } @mylist1);
    $t->stop('JLv2');
}
print $t->report('JLv2');

for my $i (1..$iterations)
{
    $t->start('LMU');
    @mylist4 = uniq( @mylist1, @mylist2 );
    $t->stop('LMU');
}
print $t->report('LMU');

for my $i (1..$iterations)
{
    @mylist5 = @mylist2;
    $t->start('HV1');
    my %k;
    map { $k{$_} = 1 } @mylist5;
    push(@mylist5, grep { !exists $k{$_} } @mylist1);
    $t->stop('HV1');
}
print $t->report('HV1');

for my $i (1..$iterations)
{
    $t->start('HV2');
    my %k;
    map { $k{$_} = 1 } @mylist1;
    map { $k{$_} = 1 } @mylist2;
    @mylist6 = keys %k;
    $t->stop('HV2');
}
print $t->report('HV2');


print sort(@mylist3), "\n";
print sort(@mylist4), "\n";
print sort(@mylist5), "\n";
print sort(@mylist6), "\n";

Black JL: perl xxx.pl
9999 trials of JLv2 (1.298s total), 129us/trial
9999 trials of LMU (968.176ms total), 96us/trial
9999 trials of HV1 (516.799ms total), 51us/trial
9999 trials of HV2 (768.073ms total), 76us/trial
ABCDE
ABCDE
ABCDE
ABCDE
Black JL:

This is Perl 5.10.0 compiled for 32-bit SPARC with multiplicity on an antique Sun E450 running Solaris 10.

I believe that the test setups are fair; they all generate their answer into a new array, separate from mylist1 and mylist2 (so mylist1 and mylist2 can be reused for the next test). The answer designated HV1 (hash values 1) has the timing start after the assignment to @mylist5, which I think is correct. However, when I did the timing with the start before the assignment, it was still quickest:

Black JL: perl xxx.pl
9999 trials of JLv2 (1.293s total), 129us/trial
9999 trials of LMU (938.504ms total), 93us/trial
9999 trials of HV1 (505.998ms total), 50us/trial
9999 trials of HV2 (756.722ms total), 75us/trial
ABCDE
ABCDE
ABCDE
ABCDE
9999 trials of HV1A (655.582ms total), 65us/trial
Black JL:
Jonathan Leffler
I cannot understand why this analysis got downvoted.
RET
@RET: the answer without the analysis got down-voted. I temporarily deleted the answer; then I did the analysis and decided to resuscitate the down-voted answer and amend it. Thanks for the up-vote.
Jonathan Leffler
+1  A: 

Because of your "(ABCDE)" comment, I'm assuming you actually meant push onto mylist1 those elements in mylist2 that aren't in mylist1. If this assumption is incorrect, you need to say something about what order you want things to end up in.

First, store which elements are in mylist1 in a hash, then push all those in mylist2 not found in the hash onto mylist1.

my %in_mylist1;
@in_mylist1{@mylist1} = ();
push @mylist1, grep ! exists $in_mylist1{$_}, @mylist2;
ysth
+20  A: 

You could just use the List::MoreUtils module's uniq:

use List::MoreUtils qw(uniq);

my @mylist1;
push( @mylist1, "A" );
push( @mylist1, "B" );
push( @mylist1, "C" );

my @mylist2;
push( @mylist2, "A" );
push( @mylist2, "D" );
push( @mylist2, "E" );

@mylist2 = uniq( @mylist1, @mylist2 );

printf "%s\n", ( join ',', @mylist2 );    # A,B,C,D,E
oeuftete
+1, "Use a module."
Tanktalus
ok, that'll work, but it's no way to learn Perl...
Alnitak
Learning to identify and use modules is a pretty important part of learning perl.
oeuftete
Sure, but you've still gotta know the fundamentals
Alnitak
I wish more people used List::MoreUtils instead of writing the same little bits of code over and over again.
Rob Van Dam
A: 
my(%work);
@work{@mylist1, @mylist2} = undef;
@mylist2 = sort keys %work;
If duplicates are allowed in mylist2 (and I see no reason they wouldn't be), then this solution deletes them.
noswonky