ansaurus

Question

How can I remove an element from a Perl array after I've processed it?

Answer 1

+4 A:

It won't actually make the processing faster, as removing from the middle of an array is an expensive operation.

Better options:

Do everything in one pass
When you build the array of IDs, include pointers (indexes, really) into the main array so that you can access its elements quickly for a given ID

Eli Bendersky 2010-02-03 19:31:35

Answer 2

A:

In perl you can use the splice() routine to remove elements from an array.

As usual, use caution when deleting from an array when looping through an array as your array indexes will change.

Ken Aspeslagh 2010-02-03 19:32:50

Answer 3

A:

Assuming you have the index at hand, use splice:

splice(@array, $indextoremove, 1)

But be careful. Your index will be invalid once you remove an element.

Vivin Paliath 2010-02-03 19:34:29

Answer 4

A:

Common methods for manipulating the contents of an array:

# start over with this list for each example:
my @list = qw(a b c d);

splice:

splice @list, 2, 1, qw(e);
# @list now contains: qw(a b e d)

pop and unshift:

pop @list;
# @list now contains: qw(a b c)

unshift @list;
# @list now contains: qw(b c d)

map:

@list = map { $_ eq 'b' ? () : $_ } @list;
# list now contains: qw(a c d);

array slices:

@list[3..4] = qw(e f);
# list now contais: qw(a b c e f);

for and foreach loops:

foreach (@list)
{
    # $_ is aliased to each element of the list in turn;
    # assignments will be propogated back to the original structure
    $_ = uc if m/[a-c]/;
}
# list now contains: qw(A B C d);

Read about all these functions at perldoc perlfunc, slices in perldoc perldata, and for loops in perldoc perlsyn.

Ether 2010-02-03 19:55:03

Answer 5

+1 A:

Why not do this:

my @extracted = map  extract_data($_), 
                grep msg_rcpt_to( $rcpt, $_ ), @log_data;

When you are done, you'll have an array of extracted data in the same order it appeared in the log.

daotoad 2010-02-03 20:00:18

Answer 6

+5 A:

Do it in a single pass:

#! /usr/bin/perl

use warnings;
use strict;

# for demo only
*ARGV = *DATA;

my %msg;
while (<>) {
  if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
    my $key = $1;
    push @{ $msg{$key}{$1} } => $2
      while /\b(to|from|client)=(.+?)(?:,|$)/g;
  }
}

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;
__DATA__
Apr  8 14:22:02 MailSecure03 postfix/smtpd[32388]: BA1CE38965: client=mail.example.com[x.x.x.x]
Apr  8 14:22:03 MailSecure03 postfix/cleanup[32070]: BA1CE38965: message-id=<[email protected]>
Apr  8 14:22:03 MailSecure03 postfix/qmgr[19685]: BA1CE38965: from=<[email protected]>, size=1087, nrcpt=2 (queue active)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: BA1CE38965: removed
Apr  8 14:22:04 MailSecure03 postfix/smtpd[32589]: 62D8438973: client=localhost.localdomain[127.0.0.1]
Apr  8 14:22:04 MailSecure03 postfix/cleanup[32080]: 62D8438973: message-id=<[email protected]>
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: from=<[email protected]>, size=1636, nrcpt=2 (queue active)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<[email protected]>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0  <[email protected]> Queued mail for delivery)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<[email protected]>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0  <[email protected]> Queued mail for delivery)
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: removed

The code works by first looking for a queue ID (e.g., BA1CE38965 and 62D8438973 above), which we store in $key.

Next, we find all matches on the current line (thanks to the /g switch) that look like to=<...>, client=mail.example.com, and so on—with and without the separating comma.

Of note in the pattern are

\b - matches on a word boundary only (prevents matching xxxto=<...>)
(to|from|client) - match to or from or client
(.+?) - matches the field's value with a non-greedy quantifier
(?:,|$) - matches either a comma or at end of string without capturing into $3

The non-greedy (.+?) forces the match to stop at the first comma it encounters rather than the last. Otherwise, on a line with

to=<[email protected]>, other=123

you'd get <[email protected]>, other=123 as the recipient!

Then for each field matched, we push it onto the end of an array (because there may be multiple recipients, for example) connected to both the queue ID and field name. Take a look at the result:

$VAR1 = {
  '62D8438973' => {
    'client' => [
      'localhost.localdomain[127.0.0.1]'
    ],
    'to' => [
      '<[email protected]>',
      '<[email protected]>'
    ],
    'from' => [
      '<[email protected]>'
    ]
  },
  'BA1CE38965' => {
    'client' => [
      'mail.example.com[x.x.x.x]'
    ],
    'to' => [
      '<[email protected]>',
      '<[email protected]>'
    ],
    'from' => [
      '<[email protected]>'
    ]
  }
};

Now say you want to print all the recipients of the message whose queue ID is BA1CE38965:

my $queueid = "BA1CE38965";
foreach my $recip (@{ $msg{$queueid}{to} }) {
  print $recip, "\n":
}

Maybe you want to know only how many recipients:

print scalar @{ $msg{$queueid}{to} }, "\n";

If you're willing to assume each message has exactly one client, access it with

print $msg{$queueid}{client}[0], "\n";

Greg Bacon 2010-02-03 20:00:50

This is fantastic, thank you... I was focused on pulling out only the messages I'm interested in (ones that match [0-9-]@ACertainDomain.com) and didn't think about just loading up all the pertinent info from the file into a hash and then pulling messages out of that.I'm going to use your code as a foundation and see if I can't build up from there. I'm sure I'll have more questions (I'm still trying to parse that 'while' regexp, I'm so rusty at this).

Justin 2010-02-03 21:04:06

@Justin You're welcome! See updated explanation.

Greg Bacon 2010-02-03 21:33:55

Thanks again. My parse now takes about 3 minutes per file as opposed to 3 hours. This community is awesome.

Justin 2010-02-03 23:59:03

@Justin Tell your friends!

Greg Bacon 2010-02-04 16:29:34

ansaurus

tags:

views:

answers:

How can I remove an element from a Perl array after I've processed it?

related questions