tags:

views:

124

answers:

1

A Perl idiom for removing duplicate values from an array:

@uniq = keys %{{map{$_=>1}@list}} 

Is it cheaper to use this version:

@uniq = keys %{{map{$_=>undef}@list}}

I tested it with these one-liners, and seems that it is true on some versions of Perl:

perl -e 'my %x; $x{$_} = 1 for 0..1000_000; system "ps -ovsz $$"' 
perl -e 'my %x; $x{$_} = undef for 0..1000_000; system "ps -ovsz $$"'
+7  A: 

Well, undef is supposed to be a flyweight value, meaning that all references to it point to the same datum. You don't get that for other literals. You still need the overhead of the slot that references it though. However, I'm not seeing it save any memory for me on Perl 5.10 or 5.11 on Mac OS X. While perl may not be using more memory in the undef case, I bet it's anticipating using more memory so it grabs it anyway. However, I'm not keen on investigating memory use in the internals right now.

Devel::Peek is pretty handy for showing these sorts of things:

#!perl

use Devel::Peek;

my $a = undef;
my $b = undef;

Dump( $a );
Dump( $b );


my %hash = map { $_, undef } 1 .. 3;
$hash{4} = 'Hello';
Dump( \%hash );

The output looks a bit scary at first, but you see that the undef values are NULL(0x0) instead of individual string values (PV):

SV = NULL(0x0) at 0x100208708
  REFCNT = 1
  FLAGS = (PADMY)
SV = NULL(0x0) at 0x100208738
  REFCNT = 1
  FLAGS = (PADMY)
SV = RV(0x100805018) at 0x100805008
  REFCNT = 1
  FLAGS = (TEMP,ROK)
  RV = 0x100208780
  SV = PVHV(0x100809ed8) at 0x100208780
    REFCNT = 2
    FLAGS = (PADMY,SHAREKEYS)
    ARRAY = 0x100202200  (0:5, 1:2, 2:1)
    hash quality = 91.7%
    KEYS = 4
    FILL = 3
    MAX = 7
    RITER = -1
    EITER = 0x0
    Elt "4" HASH = 0xb803eff9
    SV = PV(0x100801c78) at 0x100804ed0
      REFCNT = 1
      FLAGS = (POK,pPOK)
      PV = 0x100202a30 "Hello"\0
      CUR = 5
      LEN = 8
    Elt "1" HASH = 0x806b80c9
    SV = NULL(0x0) at 0x100820db0
      REFCNT = 1
      FLAGS = ()
    Elt "3" HASH = 0xa400c7f3
    SV = NULL(0x0) at 0x100820df8
      REFCNT = 1
      FLAGS = ()
brian d foy
5.10 gives me identical results also. Maybe perl started to cache small ints, like python ? :-)
eugene y
I tried it with the same value and different values for each key and got the same results. I think that Perl is probably grabbing a big chunk of memory in anticipation of filling it with values later.
brian d foy
5.10 and above are much better at using minimal memory for things that are just an int or just a reference. Before, an sv had two parts, one that had refcnt, flags, and a pointer to the rest; only an undef would end up having no second part. Now the initial struct has an extra field that used to be stored in the second struct (what field it is depends on the type).
ysth