views:

78

answers:

2

Putting a precompiled regex inside two different hashes referenced in a list:

my @list = ();

my $regex = qr/ABC/;

push @list, { 'one' => $regex };
push @list, { 'two' => $regex };

use Data::Dumper;
print Dumper(\@list);

I'd expect:

$VAR1 = [
      {
        'one' => qr/(?-xism:ABC)/
      },
      {
        'two' => qr/(?-xism:ABC)/
      }
    ];

But instead we get a circular reference:

$VAR1 = [
      {
        'one' => qr/(?-xism:ABC)/
      },
      {
        'two' => $VAR1->[0]{'one'}
      }
    ];

This will happen with indefinitely nested hash references and shallowly copied $regex.

I'm assuming the basic reason is that precompiled regexes are actually references, and references inside the same list structure are compacted as an optimization (\$scalar behaves the same way). I don't entirely see the utility of doing this (presumably a reference to a reference has the same memory footprint), but maybe there's a reason based on the internal representation

Is this the correct behavior? Can I stop it from happening? Aside from probably making GC more difficult, these circular structures create pretty serious headaches. For example, iterating over a list of queries that may sometimes contain the same regular expression will crash the MongoDB driver with a nasty segfault (see https://rt.cpan.org/Public/Bug/Display.html?id=58500)

+8  A: 

This is the expected behavior.

Your reference isn't really circular; you have two separate items that point to the same thing. Data::Dumper is printing a human-readable, Perl-parsable representation of your data structures in memory, and what it really means is that both $list[0]->{one} and $list[1]->{two} point to the same thing.

Perl uses reference-counting garbage collection, and while it can get into trouble with circular data structures, this data structure presents no particular problem.

Commodore Jaeger
Looking at the Data::Dumper page you are correct, `\$list[1]->{two}` would have been a circular reference but `$list[1]->{two}` is not. This makes the MongoDB crash behavior more mysterious (though there is an independent bug where an explicit circular reference in a query segfaults perl, see the above rt link).
Arkadiy Kukarkin
On closer inspection, the malloc()/double-free() error I saw with mongo was probably from inadvertently sharing a connection object across several forked processes. Thanks for pointing me in the right direction, the (unrelated) circular reference issue really tripped me up.
Arkadiy Kukarkin
+6  A: 

Nothing funny is happening here.

  1. You stored the same reference twice in the same data structure.
  2. Then you asked Data::Dumper to print a representation of that structure.
  3. Data::Dumper wants to roundtrip the data you give it as faithfully as possible, which means that it needs to output Perl code that will generate a data structure that contains the same reference at $list[0]{one} as it does at $list[0]{two}.
  4. It does this by outputting a data structure where one member contains a reference to another member of the same structure.
  5. But it's not actually a circular reference.
hobbs