tags:

views:

233

answers:

4

Assume a nested hash structure %old_hash ..

my %old_hash;
$old_hash{"foo"}{"bar"}{"zonk"} = "hello";

.. which we want to "flatten" (sorry if that's the wrong terminology!) to a non-nested hash using the sub &flatten(...) so that ..

my %h = &flatten(\%old_hash);
die unless($h{"zonk"} eq "hello");

The following definition of &flatten(...) does the trick:

sub flatten {
  my $hashref = shift;
  my %hash;
  my %i = %{$hashref};
  foreach my $ii (keys(%i)) {
    my %j = %{$i{$ii}};
    foreach my $jj (keys(%j)) {
      my %k = %{$j{$jj}};
      foreach my $kk (keys(%k)) {
        my $value = $k{$kk};
        $hash{$kk} = $value;
      }
    }
  }
  return %hash;
}

While the code given works it is not very readable or clean.

My question is two-fold:

  • In what ways does the given code not correspond to modern Perl best practices? Be harsh! :-)
  • How would you clean it up?
+3  A: 

First, I would use perl -c to make sure it compiles cleanly, which it does not. So, I'd add a trailing } to make it compile.

Then, I'd run it through perltidy to improve the code layout (indentation, etc.).

Then, I'd run perlcritic (in "harsh" mode) to automatically tell me what it thinks are bad practices. It complains that:

Subroutine does not end with "return"

Update: the OP essentially changed every line of code after I posted my Answer above, but I believe it still applies. It's not easy shooting at a moving target :)

toolic
+9  A: 

Your method is not best practices because it doesn't scale. What if the nested hash is six, ten levels deep? The repetition should tell you that a recursive routine is probably what you need.

sub flatten {
    my ($in, $out) = @_;
    for my $key (keys %$in) {
        my $value = $in->{$key};
        if ( defined $value && ref $value eq 'HASH' ) {
            flatten($value, $out);
        }
        else {
            $out->{$key} = $value;
        }
    }
}

Alternatively, good modern Perl style is to use CPAN wherever possible. Data::Traverse would do what you need:

use Data::Traverse;
sub flatten {
    my %hash = @_;
    my %flattened;
    traverse { $flattened{$a} = $b } \%hash;
    return %flattened;
}

As a final note, it is usually more efficient to pass hashes by reference to avoid them being expanded out into lists and then turned into hashes again.

rjh
I'm a big fan of Data::Traverse.
friedo
No suprise here, you are its principal author ;)
willert
-1 your first block of code contains a number of typos, `$values` is never declared and you have mixed hash and reference to hash syntax
Eric Strom
Data::Traverse was exactly what I was looking for. Thanks a lot!
knorv
@Eric - Thanks, I've fixed those. Sometimes I rush writing code that explains the general idea of something, as I don't expect it to be used wholesale without some analysis first. But I was definitely sloppy, thanks for pointing it out.
rjh
+1 for the general idea behind the second example. But this won't work: `my %hash = shift`. I assume the intent was to have the function receive a hash ref rather than a hash.
FM
@FM should be `my %hash = @_;`. Obviously, my brain was not functioning that evening.
rjh
+1  A: 

Is it your intent to end up with a copy of the original hash or just a reordered result?

Your code starts with one hash (the original hash that is used by reference) and makes two copies %i and %hash.

The statement my %i=%{hashref} is not necessary. You are copying the entire hash to a new hash. In either case (whether you want a copy of not) you can use references to the original hash.

You are also losing data if your hash in the hash has the same value as the parent hash. Is this intended?

drewk
+1  A: 

There are a few problems with your approach that you need to figure out. First off, what happens in the event that there are two leaf nodes with the same key? Does the second clobber the first, is the second ignored, should the output contain a list of them? Here is one approach. First we construct a flat list of key value pairs using a recursive function to deal with other hash depths:

my %data = (
    foo  => {bar  => {baz  => 'hello'}},
    fizz => {buzz => {bing => 'world'}},
    fad  => {bad  => {baz  => 'clobber'}},
);


sub flatten {
    my $hash = shift;
    map {
        my  $value = $$hash{$_};
        ref $value eq 'HASH' 
            ? flatten($value) 
            : ($_ =>  $value)
    } keys %$hash
}

print join( ", " => flatten \%data), "\n";
# baz, clobber, bing, world, baz, hello

my %flat = flatten \%data;

print join( ", " => %flat ), "\n";
# baz, hello, bing, world          # lost (baz => clobber)

A fix could be something like this, which will create a hash of array refs containing all the values:

sub merge {
    my %out;
    while (@_) {
        my ($key, $value) = splice @_, 0, 2;
        push @{ $out{$key} }, $value
    }
    %out
}

my %better_flat = merge flatten \%data;

In production code, it would be faster to pass references between the functions, but I have omitted that here for clarity.

Eric Strom