views:

76

answers:

5

I have a pipe delimited text file containing, among other things, a date and a number indicating the lines sequence elsewhere in the program. What I'm hoping to do is from that file create a hash using the year as the key and the value being the maximum sequence for that year (I essentially need to implement an auto-incremented key per year) e.g from

2000|1
2003|9
2000|5
2000|21
2003|4

I would end with a hash like:

%hash = {
    2000 => 21,
    2003 => 9
}

I've managed to split the file into the year and sequence parts (not very well I think) like so:

my @dates = map {
    my @temp = split /\|/;
    join "|", (split /\//, $temp[1])[-1], $temp[4] || 0; #0 because some records
                                                         #mightn't have a sequence
} @info

Is there something succint I could do to create a hash using that data?

Thanks

+2  A: 

Here's how you could write that .. not too sure why you want/need to use map (please explain)

#!/usr/bin/perl -w
use strict;
use warnings;

my %hash;

while(<DATA>) {
   chomp();
   my ($year,$sequence)=split('\|');

    $sequence = 0 unless (defined ($sequence));       

    next if (exists $hash{$year} and $sequence < $hash{$year});

   $hash{$year}=$sequence;
}


__DATA__
2000|1
2003|9
2000|5
2000|21
2003|4


I added the $sequence = 0 unless defined ($sequence); because of that comment in your snippet. I believe I might understand your intent there.. (either the input format is valid/consistent, or it is not ..)

lexu
+1  A: 

map { BLOCK } LIST always usually (unless BLOCK sometimes evaluates to an empty list) returns a list that is least as large as LIST, and may not be the way to go if you do want to simply overwrite duplicate keys with the latest data. Something like:

    my %hash;
    for (@info) {
        my @temp = split /\|/;
        my $key = (split /\//, $temp[1]);
        my $value = $temp[4] || 0;
        $hash{$key} = $value unless defined $hash{$key} && $hash{$key}>=$value;
    }

will work. The last line conditionally updates the hash table, which is something you can't do (or at least can't do very conveniently) inside a map statement.

mobrule
map block list and map expression list both return 0 or more elements per element in list: `map { $_ == 5 ? () : qw(a b c) } 1..5` gives `a, b, c` as its result. If you want nothing from an element in the list return an empty list.
daotoad
@daotoad: I had a feeling someone was going to prove me wrong about that.
mobrule
+2  A: 

map operates on each item in a list and builds a list of results to pass on. So, you can't really do the sort of checks you want (keep the maximum sequence value) as you go unless you build a scratch hash that winds up containing exactly the data you are trying to build as the return value of the `map.

my %results = map {

    my( $y, $s ) = split '[|]', $_;

    seq_is_gt_year_seq( $y, $s ) 
       ? ( $y, $s )
       : ();
} @year_pipe_seq;

To implement seq_is_gt_year_seq() we wind up having to build a temporary hash that stores each year and its max sequence value for lookup.

You should use an approach that builds the lookup incrementally, like a for or while loop.

daotoad
+3  A: 

If I understand you, you were almost there. All you needed to do was return the key and value from map and sort by sequence instead of joining them.

my %hash = 
    map @$_,
    sort { $a->[1] <=> $b->[1] }
    map {
        my @temp = split /\|/;
        my $date = (split /\//, $temp[1])[-1];
        my $seq = $temp[4] || 0; #0 because some records mightn't have a sequence
        [ $date, $seq ]
    } @info;

But just iterating through with for and setting hash only if the current sequence is higher than the previous maximum for that date is probably a better idea.

Be careful with those {}; where you said

%hash = {
    2000 => 21,
    2003 => 9
}

you meant () instead (or to be assigning to a reference $hash), since the {} there create an anonymous hash and return a reference to it.

ysth
Your `map sort map` is clever and does the job. As you know, this method comes at a greater computational cost than a `while` or `for` loop--'sort` doesn't come cheap. ++ for solving the question asked, ++ for demonstrating chained a `map` solution, ++ for mentioning that this is a bad way solve the problem, but -- for not explaining why this approach is a bad idea.
daotoad
A: 

If there's any chance you can perform this processing as the file is read, then I'd do it. Something like this:

my %year_count;
while (my $line = <$fh>){
    chomp $line;
    my ($year, $num) = split /\|/, $line;
    if ($num > $year_count{$year} || !defined $year_count{$year})
        $year_count{$year} = $num;
    }
}

if you want to use an array, map isn't really the best choice (since you're not transforming the list, you're processing it down to something different). To be honest the most sensible array-processing would probably be the same as the above, but in a foreach instead:

my %year_count;
foreach my $line (@info){
    my ($year, $num) = split /\|/, $line;
    if ($num > $year_count{$year} || !defined $year_count{$year})
        $year_count{$year} = $num;
    }
}
Dan