tags:

views:

163

answers:

6

I'm writing a script that parses the "pure-ftpwho -s" command to get a list of the current transfers. But when a user disconnects from the FTP and reconnects back and resumes a transfer, the file shows up twice. I want to remove the ghosted one with Perl. After parsing, here is what the arrayref looks like (dumped with Data::Dumper)

$VAR1 = [
      {
        'status' => 'DL',
        'percent' => '20',
        'speed' => '10',
        'file' => 'somefile.txt',
        'user' => 'user1',
        'size' => '14648'
      },
      {
        'status' => 'DL',
        'percent' => '63',
        'speed' => '11',
        'file' => 'somefile.txt',
        'user' => 'user1',
        'size' => '14648'
      },
      {
        'status' => 'DL',
        'percent' => '16',
        'speed' => '60',
        'file' => 'somefile.txt',
        'user' => 'user2',
        'size' => '14648'
      }
    ];

Here user1 and user2 are downloading the same file, but user1 appears twice because the first one is a "ghost". What's the best way to check and remove elements that I don't need (in this case the first element of the arrayref). The condition to check is that - If the "file" key and "user" key is the same, then delete the hashref that contains the smaller value of "percent" key (if they're the same then delete all except one).

+4  A: 

If order in the original arrayref doesn't matter, this should work:

my %users;
my @result;

for my $data (@$arrayref) {
    push @{ $users{$data->{user}.$data->{file}} }, $data;
}

for my $value (values %users) {
    my @data = sort { $a->{percent} <=> $b->{percent} } @$value;
    push @result, $data[-1];
}

This can definitely be improved for efficiency.

eugene y
Thanks, that worked.
somebody
+3  A: 

For what it's worth, here's my (slightly) alternative approach. Again, it doesn't preserve the original order:

my %most_progress;

for my $data ( sort { $b->{percent} <=> $a->{percent} } @$data ) {
    next if exists $most_progress{$data->{user}.$data->{file}};
    $most_progress{$data->{user}.$data->{file}} = $data;
}

my @clean_data = values %most_progress;
tjmw
A: 
my %check; 

for (my $i = 0; $i <= $#{$arrayref}; $i++) { 
  my $transfer = $arrayref->[$i]; 

  # check the transfer for user and file 
  my $key = $transfer->{user} . $transfer->{file};  
  $check{$key} = { } if ( !exists $check{$key} ); 

  if ( $transfer->{percent} <= $check{$key}->{percent} ) { 
    # undefine this less advanced transfer 
    $arrayref->[$i] = undef; 

  } else { 
    # remove the other transfer 
    $arrayref->[$check{$key}->{index}] = undef if exists $check{$key}->{index}; 

    # set the new standard 
    $check{$key} = { index => $i, percent => $transfer->{percent} } 
  } 
}  

# remove all undefined transfers     
$arrayref = [ grep { defined $_ } @$arrayref ];
Hartog
A: 

Variation on the theme with Perl6::Gather

use Perl6::Gather;

my @cleaned = gather {
    my %seen;
    for (sort { $b->{percent} <=> $a->{percent} } @$data) {
        take unless $seen{ $_->{user} . $_->{file} }++;
    }
};

/I3az/

draegtun
+2  A: 

This will preserve order:

use strict;
use warnings;

my $data = [ ... ]; # As posted.

my %pct;
for my $i ( 0 .. $#{$data} ){
    my $r = $data->[$i];
    my $k = join '|', $r->{file}, $r->{user};
    next if exists $pct{$k} and $pct{$k}[1] >= $r->{percent};
    $pct{$k} = [$i, $r->{percent}];
}

@$data = @$data[sort map $_->[0], values %pct];
FM
+4  A: 

The correct solution in this case would have been to use a hash when parsing the log file. Put all information into a hash, say %log, keyed by user and file:

$log{$user}->{$file} = {
    'status' => 'DL',
    'percent' => '20',
    'speed' => '10',
    'size' => '14648'
};

etc. Latter entries in the log file would overwrite earlier ones. Alternatively, you can choose to overwrite entries with lower percent completed with ones that have higher completion rates.

Using a hash would get rid of a lot of completely superfluous code working around the choice of the wrong data structure.

Sinan Ünür