ansaurus

Question

How do I delete a [sub]hash based off of the keys/values of another hash?

Answer 1

+5 A:

You were close. Remember that $hash2{major}{$test} is a scalar, not a hash reference.

#! /usr/bin/perl

use strict;
use warnings;

my %hash1 = ( 
  test1 => { inner1 => { more => "alpha", evenmore => "beta" } },
  test2 => { inner2 => { more => "charlie", somethingelse => "delta" } },
  test3 => { inner9999 => { ohlookmore => "golf", somethingelse => "foxtrot" } }
);

my %hash2 = (
  major => { test2 => "inner2",
             test3 => "inner3" }
);

foreach my $k (keys %hash1) {
  my $delete = 1;
  foreach my $inner (keys %{ $hash1{$k} }) {
    $delete = 0, last if exists $hash2{major}{$k} &&
                                $hash2{major}{$k} eq $inner;
  }
  delete $hash1{$k} if $delete;
}

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%hash1;

The line beginning with $delete = 0, ... is a bit cutesy. It's equivalent to $delete = 0; last; within another conditional, but it was already nested twice. Not wanting to build a matryoshka doll, I used a statement modifier, but as the name suggests, it modifies a single statement.

That's where Perl's comma operator comes in:

Binary , is the comma operator. In scalar context it evaluates its left argument, throws that value away, then evaluates its right argument and returns that value. This is just like C's comma operator.

In this case, the left argument is the expression $delete = 0, and the right argument is last.

The conditional might seem needlessly fussy, but

... if $hash2{major}{$k} eq $inner;

produces undefined-value warnings when probing for tests not mentioned in %hash2 (test1/inner1, for example). Using

.. if $hash2{major}{$k} && $hash2{major}{$k} eq $inner;

would incorrectly delete a test mentioned in %hash2 if its "inner name" were a false value such as the string "0". Yes, using exists here may be needlessly fussy, but not knowing your actual hash keys, I chose the conservative route.

Output:

$VAR1 = {
  'test2' => {
    'inner2' => {
      'somethingelse' => 'delta',
      'more' => 'charlie'
    }
  }
};

Although you don't violate it, be aware of the following caveat related to using each:

If you add or delete elements of a hash while you're iterating over it, you may get entries skipped or duplicated, so don't. Exception: It is always safe to delete the item most recently returned by each, which means that the following code will work:
    while (($key, $value) = each %hash) {
      print $key, "\n";
      delete $hash{$key};   # This is safe
    }

Update: Searching hashes as though they were arrays (impress your CS nerd friends by saying “… linearly rather than logarithmically”) is a red flag, and the code above does just that. A better approach, which turns out to be similar to Penfold's answer, is

%hash1 = map +($_ => $hash1{$_}),
         grep exists $hash2{major}{$_} &&
              exists $hash1{$_}{ $hash2{major}{$_} },
         keys %hash1;

In nice declarative style, it describes the desired contents of %hash1, namely

first-level keys of %hash1 should be mentioned in $hash2{major}, and
the value in $hash2{major} corresponding to each first-level key should itself be a subkey of that key back in %hash1

(Wow, dizzying. We need multiple placeholder variables in English!)

The unary plus in +($_ => $hash1{$_}) disambiguates for the poor parser so it knows we want the expression treated as a “pair.” See the end of the perlfunc documentation on map for other cases when this may be necessary.

Greg Bacon 2010-04-02 21:03:26

Is it possible to get an explination of this line: $delete = 0, last if exists $hash2{major}{$k} I sort of understand it, but am really thrown off with the ',' and last usage.

Zack 2010-04-03 21:38:13

@Zack Thanks for the checkmark! Explanation provided in the updated answer plus an extra bonus.

Greg Bacon 2010-04-03 23:42:05

Answer 2

+1 A:

This is the way I would do it: (Third try's the charm)

foreach ( map { [ $_ => $hash2{major}{$_} ] } keys %hash1 ) { 
    my ( $key, $value ) = @$_;
    if ( defined $value and my $new_value = $hash1{$key}{$value} ) { 
        $hash1{$key} = $new_value;
    }
    else { 
        delete $hash1{$key};
    }
}

Axeman 2010-04-02 21:05:46

Answer 3

+3 A:

You can do it as a one-liner, all because delete() will take an array of keys. It's not quite as easy as I first though, but now I've read the problem properly...

delete @hash1{ 
        grep(
            !(
              exists($hash2{major}->{$_}) 
                && 
              exists( $hash1{$_}->{ $hash2{major}->{$_} } )
            ),
            keys %hash1
        )
    };

Penfold 2010-04-02 21:29:23

Answer 4

+1 A:

# This is the actual hash we want to iterate over.
my $keepers = $hash2{major};

%hash1 = map { $_ => $hash1{$_} }  # existing key and hash contents in %hash1
             grep { exists $keepers->{$_} and               # key there?
                    exists $hash1{$_}->{ $keepers->{$_} } } # key in hash there?
             (keys %hash1);        # All the keys we might care about

This works because we essentially work out the lists of things we want/don't want in three independent stages:

The keys call gets all the keys that are in hash1 in one step.
The grep generates (as one step) the list of keys that match our criterion.
The map generates (as one step) a set of keys and values that are the ones we want.

This way we never alter the primary hash until we're ready to do so. If %hash1 contains many keys, we're going to use a lot of memory. If you're worried about that, you'd do something like this:

# Initialization as before ...

use File::Temp qw(tempfile);

my ($fh, $file) = tempfile();
my $keepers = $hash2{major};

print $fh "$_\n" for (keys %hash1);
close $fh;
open $fh, "<", $file or die "can't reopen tempfile $file: $!\n";
while ( defined ($_ = <$fh>) ) {
  chomp;
  delete $hash1{$_} 
    unless exists $keepers->{$_} and 
           exists $hash1{$_}->{ $keepers->{$_} }; 
}

This one works because we're not iterating over the hash, but over a stored copy of its keys.

Joe McMahon 2010-04-02 21:53:25

Why write to a file when you could just go my @keys = keys %hash1;?

Penfold 2010-04-03 12:37:32

That doubles the memory footprint, since you just made a copy of all the keys.

Joe McMahon 2010-04-13 20:45:05

ansaurus

tags:

views:

answers:

How do I delete a [sub]hash based off of the keys/values of another hash?

related questions