views:

95

answers:

4

Lets assume I have two hashes. One of them contains a set of data that only needs to keep things that show up in the other hash.

e.g.

my %hash1 = ( 
        test1 => { inner1 => { more => "alpha", evenmore => "beta" } },
        test2 => { inner2 => { more => "charlie", somethingelse => "delta" } },
        test3 => { inner9999 => { ohlookmore => "golf", somethingelse => "foxtrot" } }
    );

my %hash2 = (
        major=> { test2 => "inner2",
              test3 => "inner3" }  );

What I would like to do, is to delete the whole subhash in hash1 if it does not exist as a key/value in hash2{major}, preferably without modules. The information contained in "innerX" does not matter, it merely must be left alone (unless the subhash is to be deleted then it can go away).

In the example above after this operation is preformed hash1 would look like:

my %hash1 = ( 
        test2 => { inner2 => { more => "charlie", somethingelse => "delta" } },
        );

It deletes hash1{test1} and hash1{test3} because they don't match anything in hash2.

Here's what I've currently tried, but it doesn't work. Nor is it probably the safest thing to do since I'm looping over the hash while trying to delete from it. However I'm deleting at the each which should be okay?

This was my attempt at doing this, however perl complains about:

Can't use string ("inner1") as a HASH ref while "strict refs" in use at

while(my ($test, $inner) = each %hash1)
{
    if(exists $hash2{major}{$test}{$inner})
    {
        print "$test($inner) is in exists.\n";
    }
    else
    {
        print "Looks like $test($inner) does not exist, REMOVING.\n";
       #not to sure if $inner is needed to remove the whole entry
         delete ($hash1{$test}{$inner});
    } 
}
+5  A: 

You were close. Remember that $hash2{major}{$test} is a scalar, not a hash reference.

#! /usr/bin/perl

use strict;
use warnings;

my %hash1 = ( 
  test1 => { inner1 => { more => "alpha", evenmore => "beta" } },
  test2 => { inner2 => { more => "charlie", somethingelse => "delta" } },
  test3 => { inner9999 => { ohlookmore => "golf", somethingelse => "foxtrot" } }
);

my %hash2 = (
  major => { test2 => "inner2",
             test3 => "inner3" }
);

foreach my $k (keys %hash1) {
  my $delete = 1;
  foreach my $inner (keys %{ $hash1{$k} }) {
    $delete = 0, last if exists $hash2{major}{$k} &&
                                $hash2{major}{$k} eq $inner;
  }
  delete $hash1{$k} if $delete;
}

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%hash1;

The line beginning with $delete = 0, ... is a bit cutesy. It's equivalent to $delete = 0; last; within another conditional, but it was already nested twice. Not wanting to build a matryoshka doll, I used a statement modifier, but as the name suggests, it modifies a single statement.

That's where Perl's comma operator comes in:

Binary , is the comma operator. In scalar context it evaluates its left argument, throws that value away, then evaluates its right argument and returns that value. This is just like C's comma operator.

In this case, the left argument is the expression $delete = 0, and the right argument is last.

The conditional might seem needlessly fussy, but

... if $hash2{major}{$k} eq $inner;

produces undefined-value warnings when probing for tests not mentioned in %hash2 (test1/inner1, for example). Using

.. if $hash2{major}{$k} && $hash2{major}{$k} eq $inner;

would incorrectly delete a test mentioned in %hash2 if its "inner name" were a false value such as the string "0". Yes, using exists here may be needlessly fussy, but not knowing your actual hash keys, I chose the conservative route.

Output:

$VAR1 = {
  'test2' => {
    'inner2' => {
      'somethingelse' => 'delta',
      'more' => 'charlie'
    }
  }
};

Although you don't violate it, be aware of the following caveat related to using each:

If you add or delete elements of a hash while you're iterating over it, you may get entries skipped or duplicated, so don't. Exception: It is always safe to delete the item most recently returned by each, which means that the following code will work:

    while (($key, $value) = each %hash) {
      print $key, "\n";
      delete $hash{$key};   # This is safe
    }

Update: Searching hashes as though they were arrays (impress your CS nerd friends by saying “… linearly rather than logarithmically”) is a red flag, and the code above does just that. A better approach, which turns out to be similar to Penfold's answer, is

%hash1 = map +($_ => $hash1{$_}),
         grep exists $hash2{major}{$_} &&
              exists $hash1{$_}{ $hash2{major}{$_} },
         keys %hash1;

In nice declarative style, it describes the desired contents of %hash1, namely

  1. first-level keys of %hash1 should be mentioned in $hash2{major}, and
  2. the value in $hash2{major} corresponding to each first-level key should itself be a subkey of that key back in %hash1

(Wow, dizzying. We need multiple placeholder variables in English!)

The unary plus in +($_ => $hash1{$_}) disambiguates for the poor parser so it knows we want the expression treated as a “pair.” See the end of the perlfunc documentation on map for other cases when this may be necessary.

Greg Bacon
Is it possible to get an explination of this line: $delete = 0, last if exists $hash2{major}{$k} I sort of understand it, but am really thrown off with the ',' and last usage.
Zack
@Zack Thanks for the checkmark! Explanation provided in the updated answer plus an extra bonus.
Greg Bacon
+1  A: 

This is the way I would do it: (Third try's the charm)

foreach ( map { [ $_ => $hash2{major}{$_} ] } keys %hash1 ) { 
    my ( $key, $value ) = @$_;
    if ( defined $value and my $new_value = $hash1{$key}{$value} ) { 
        $hash1{$key} = $new_value;
    }
    else { 
        delete $hash1{$key};
    }
}
Axeman
+3  A: 

You can do it as a one-liner, all because delete() will take an array of keys. It's not quite as easy as I first though, but now I've read the problem properly...

delete @hash1{ 
        grep(
            !(
              exists($hash2{major}->{$_}) 
                && 
              exists( $hash1{$_}->{ $hash2{major}->{$_} } )
            ),
            keys %hash1
        )
    };
Penfold
+1  A: 
# This is the actual hash we want to iterate over.
my $keepers = $hash2{major};

%hash1 = map { $_ => $hash1{$_} }  # existing key and hash contents in %hash1
             grep { exists $keepers->{$_} and               # key there?
                    exists $hash1{$_}->{ $keepers->{$_} } } # key in hash there?
             (keys %hash1);        # All the keys we might care about

This works because we essentially work out the lists of things we want/don't want in three independent stages:

  1. The keys call gets all the keys that are in hash1 in one step.
  2. The grep generates (as one step) the list of keys that match our criterion.
  3. The map generates (as one step) a set of keys and values that are the ones we want.

This way we never alter the primary hash until we're ready to do so. If %hash1 contains many keys, we're going to use a lot of memory. If you're worried about that, you'd do something like this:

# Initialization as before ...

use File::Temp qw(tempfile);

my ($fh, $file) = tempfile();
my $keepers = $hash2{major};

print $fh "$_\n" for (keys %hash1);
close $fh;
open $fh, "<", $file or die "can't reopen tempfile $file: $!\n";
while ( defined ($_ = <$fh>) ) {
  chomp;
  delete $hash1{$_} 
    unless exists $keepers->{$_} and 
           exists $hash1{$_}->{ $keepers->{$_} }; 
}

This one works because we're not iterating over the hash, but over a stored copy of its keys.

Joe McMahon
Why write to a file when you could just go my @keys = keys %hash1;?
Penfold
That doubles the memory footprint, since you just made a copy of all the keys.
Joe McMahon