ansaurus

Question

How do I add elements from one Perl array that aren't already in the other array?

Answer 1

+12 A:

my %k;
map { $k{$_} = 1 } @mylist1;
map { $k{$_} = 1 } @mylist2;
@mylist2 = keys %k;

Alternatively:

my %k;
map { $k{$_} = 1 } @mylist2;
push(@mylist2, grep { !exists $k{$_} } @mylist1);

Actually - these might be wrong because they don't account for whether duplicates might exist in either of the original lists.

You didn't say in your question whether the lists are supposed to represent sets (which can't contain duplicates) or just plain lists. That you effectively want @mylist2 = @mylist1 U @mylist2 suggests that you are treating them as sets.

EDIT: changed increment to assign - saves a read of the hash value

Alnitak 2008-11-27 19:28:40

This is okay if you don't need to keep the original order.

Brad Gilbert 2008-11-29 16:49:16

The second option is the fastest according to my measurements - and faster than the uniq method in List::MoreUtils.

Jonathan Leffler 2008-11-30 07:04:52

Answer 2

+1 A:

[Original answer as of 2008-11-27 down to "Since the question"; the analysis from there on is new as of 2008-11-29.]

Quickest - not sure. This works, though it is not pretty:

#!/bin/perl -w
use strict;

my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");

my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");

sub value_in
{
    my($value, @array) = @_;
    foreach my $element (@array)
    {
        return 1 if $value eq $element;
    }
    return 0;
}

@mylist2 = (@mylist2, grep { ! value_in($_, @mylist2) } @mylist1);

print sort @mylist2, "\n";

This avoids converting the arrays into hashes - but for large arrays, the value_in sub may be slow.

Since the question was "what is the quickest method", I did some benchmarking. To my none-too-vast surprise, my method was slowest. Somewhat to my surprise, the fastest method was not from List::MoreUtils. Here's the test code and the results - using a modified version of my original proposal.

#!/bin/perl -w
use strict;
use List::MoreUtils  qw(uniq);
use Benchmark::Timer;

my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");

my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");

sub value_in
{
    my($value) = shift @_;
    return grep { $value eq $_ } @_;
}

my @mylist3;
my @mylist4;
my @mylist5;
my @mylist6;

my $t = Benchmark::Timer->new(skip=>1);
my $iterations = 10000;

for my $i (1..$iterations)
{
    $t->start('JLv2');
    @mylist3 = (@mylist2, grep { ! value_in($_, @mylist2) } @mylist1);
    $t->stop('JLv2');
}
print $t->report('JLv2');

for my $i (1..$iterations)
{
    $t->start('LMU');
    @mylist4 = uniq( @mylist1, @mylist2 );
    $t->stop('LMU');
}
print $t->report('LMU');

for my $i (1..$iterations)
{
    @mylist5 = @mylist2;
    $t->start('HV1');
    my %k;
    map { $k{$_} = 1 } @mylist5;
    push(@mylist5, grep { !exists $k{$_} } @mylist1);
    $t->stop('HV1');
}
print $t->report('HV1');

for my $i (1..$iterations)
{
    $t->start('HV2');
    my %k;
    map { $k{$_} = 1 } @mylist1;
    map { $k{$_} = 1 } @mylist2;
    @mylist6 = keys %k;
    $t->stop('HV2');
}
print $t->report('HV2');


print sort(@mylist3), "\n";
print sort(@mylist4), "\n";
print sort(@mylist5), "\n";
print sort(@mylist6), "\n";

Black JL: perl xxx.pl
9999 trials of JLv2 (1.298s total), 129us/trial
9999 trials of LMU (968.176ms total), 96us/trial
9999 trials of HV1 (516.799ms total), 51us/trial
9999 trials of HV2 (768.073ms total), 76us/trial
ABCDE
ABCDE
ABCDE
ABCDE
Black JL:

This is Perl 5.10.0 compiled for 32-bit SPARC with multiplicity on an antique Sun E450 running Solaris 10.

I believe that the test setups are fair; they all generate their answer into a new array, separate from mylist1 and mylist2 (so mylist1 and mylist2 can be reused for the next test). The answer designated HV1 (hash values 1) has the timing start after the assignment to @mylist5, which I think is correct. However, when I did the timing with the start before the assignment, it was still quickest:

Black JL: perl xxx.pl
9999 trials of JLv2 (1.293s total), 129us/trial
9999 trials of LMU (938.504ms total), 93us/trial
9999 trials of HV1 (505.998ms total), 50us/trial
9999 trials of HV2 (756.722ms total), 75us/trial
ABCDE
ABCDE
ABCDE
ABCDE
9999 trials of HV1A (655.582ms total), 65us/trial
Black JL:

Jonathan Leffler 2008-11-27 19:37:26

I cannot understand why this analysis got downvoted.

RET 2008-11-30 23:19:54

@RET: the answer without the analysis got down-voted. I temporarily deleted the answer; then I did the analysis and decided to resuscitate the down-voted answer and amend it. Thanks for the up-vote.

Jonathan Leffler 2008-12-01 01:37:52

Answer 3

+1 A:

Because of your "(ABCDE)" comment, I'm assuming you actually meant push onto mylist1 those elements in mylist2 that aren't in mylist1. If this assumption is incorrect, you need to say something about what order you want things to end up in.

First, store which elements are in mylist1 in a hash, then push all those in mylist2 not found in the hash onto mylist1.

my %in_mylist1;
@in_mylist1{@mylist1} = ();
push @mylist1, grep ! exists $in_mylist1{$_}, @mylist2;

ysth 2008-11-27 19:45:08

Answer 4

+20 A:

You could just use the List::MoreUtils module's uniq:

use List::MoreUtils qw(uniq);

my @mylist1;
push( @mylist1, "A" );
push( @mylist1, "B" );
push( @mylist1, "C" );

my @mylist2;
push( @mylist2, "A" );
push( @mylist2, "D" );
push( @mylist2, "E" );

@mylist2 = uniq( @mylist1, @mylist2 );

printf "%s\n", ( join ',', @mylist2 );    # A,B,C,D,E

oeuftete 2008-11-27 19:58:10

+1, "Use a module."

Tanktalus 2008-11-27 20:43:46

ok, that'll work, but it's no way to learn Perl...

Alnitak 2008-11-27 20:44:32

Learning to identify and use modules is a pretty important part of learning perl.

oeuftete 2008-11-27 20:55:32

Sure, but you've still gotta know the fundamentals

Alnitak 2008-11-27 22:08:05

I wish more people used List::MoreUtils instead of writing the same little bits of code over and over again.

Rob Van Dam 2010-01-14 05:49:41

Answer 5

A:

my(%work);
@work{@mylist1, @mylist2} = undef;
@mylist2 = sort keys %work;

2008-11-30 00:44:08

If duplicates are allowed in mylist2 (and I see no reason they wouldn't be), then this solution deletes them.

noswonky 2010-03-16 05:37:51

ansaurus

tags:

views:

answers:

How do I add elements from one Perl array that aren't already in the other array?

related questions