ansaurus

Question

How can I remove duplicates and sorting at the same time in Perl?

Answer 1

+1 A:

To remove duplicates, the best way is to use List::MoreUtils's uniq,:

use List::MoreUtils 'uniq';
my @unique_list = uniq @list;

or without CPAN (although this is rarely necessary):

my %values;
@values{@list} = ();
my @unique_list = keys %values;

You can sort any list using the built-in function sort -- see perldoc -f sort and perldoc -q 'How do I sort an array'.

Incidentally, the data you have quoted does not match the behaviour you are describing. If you declare an array as

@uniqarr = qw(error 0 goodrecordno:6123, error 0 goodrecordno:6143, error 1 goodrecordno:10245, error 1 goodrecordno:10678, error 1 goodrecordno:10698, error 2 goodrecordno:16245, error 2 goodrecordno:16123);

...then its content will contain:

(
  'error',
  '0',
  'goodrecordno:6123,',
  'error',
  '0',
  'goodrecordno:6143,',
  'error',
  '1',
  'goodrecordno:10245,',
  'error',
  '1',
  'goodrecordno:10678,',
  'error',
  '1',
  'goodrecordno:10698,',
  'error',
  '2',
  'goodrecordno:16245,',
  'error',
  '2',
  'goodrecordno:16123'
);

What you need to do is read in the data into a hash table, then parse according to your criteria. I cannot go further as it is not at all clear what you are looking for. Please read perldoc perldata and perldoc perldsc to learn more about Perl data structures.

Ether 2010-09-08 20:24:20

You just care about the keys, `@values{@list} = ();` is faster and uses less memory.

Chas. Owens 2010-09-08 20:48:01

@Chas: nice; I didn't realize hash slices would work without a RHS list of the same length.

Ether 2010-09-08 20:50:57

You don't want to sort. That's too much work for this problem. I don't see how any of this answer actually addresses the problem. Unique elements aren't any help here.

brian d foy 2010-09-08 20:52:52

@brian d foy The question is "Removing Duplicates and Sorting at the same time." You don't see how making it unique and sorting the result addresses the question?

Chas. Owens 2010-09-08 20:59:45

I see that's what he said in the title, but not what he asked in the question. People who don't know basic things often ask for something other than what they actually want because they are fixated on a solution (the XY problem). I explained in my answer how to do it with neither sorting nor unique-ing. You have to think about the real question, not the literal one. However, you can submit your solution to show how you need to sort and unique the list to accomplish this.

brian d foy 2010-09-08 21:26:09

Answer 2

+3 A:

This is the basic minimum-maximum problem that you'll find in beginning Perl books. You make one pass through all the elements and remember which one was the lowest as you go along. This is much better than sorting, which is designed for you to put all the elements in order, which is not what you are after.

use strict;
use warnings;

# I'll assume those commas were a mistake. You don't need to separate
# items with commas in a quotewords list
# If I'm wrong, the process is the same although the data massaging
# will be a little different
my @elements = qw(
    error 0 goodrecordno:6123
    error 0 goodrecordno:6143
    error 1 goodrecordno:10245 
    error 1 goodrecordno:10678 
    error 1 goodrecordno:10698 
    error 2 goodrecordno:16245 
    error 2 goodrecordno:16123
    );

my %lowest;
while( my( $error, $number, $goodrecno ) = splice @elements, 0, 3, () )
    {
    my( $recno ) = $goodrecno =~ /(\d+)/;

    # This hash remembers the lowest $recno. If you find another
    # a lower number, you replace the previous value.
    $lowest{$number} = $recno if( 
        ! exists $lowest{$number} 
            ||
        $recno < $lowest{$number}
        );
    }

Once you've created the hash that has the lowest elements, you just print it:

foreach my $number ( sort { $a <=> $b } keys %lowest ) {
    print "error $number goodrecordno:$lowest{$number}\n";
    };

This give you the output that you were looking for:

error 0 goodrecordno:6123
error 1 goodrecordno:10245
error 2 goodrecordno:16123

This is a basic template for these sorts of problems. Step 1: scan the data to remember what you want, using a hash to key those data. Step 2: output the contents of the hash.

brian d foy 2010-09-08 20:49:57

Answer 3

A:

use strict;
use warnings;

my @raw_data = qw(
    error 0 goodrecordno:6123
    error 0 goodrecordno:6143
    error 1 goodrecordno:10245 
    error 1 goodrecordno:10678 
    error 1 goodrecordno:10698 
    error 2 goodrecordno:16245 
    error 2 goodrecordno:16123
);

# Create a better data structure.
#    $errors{NUMBER} = [LIST OF RECNO VALUES]
my %errors;
while( my($e, $n, $recno) = splice @raw_data, 0, 3 ){
    push @{$errors{$n}}, $recno =~ /(\d+)$/;
}

# Now you can easily compute the minimums, and you
# are in better shape to perform other tasks in your program.
for my $n ( sort {$a <=> $b} keys %errors ){
    my $min = 9e99;
    for my $recno ( @{$errors{$n}} ){
        $min = $recno if $recno < $min;
    }

    print $n, ' ', $min, "\n";
}

FM 2010-09-08 23:22:52

@FM what if my array is of this kind

Sunny 2010-09-09 17:32:10

@Sunny Sorry, but I don't follow your question.

FM 2010-09-09 18:23:55

Answer 4

A:

As everyone else has already pointed out your first problem is that qw() is inappropriate to establish this array.

There are multiple ways of doing it correctly, I'm going to use an array of hashes here which is the more verbose option, it's fairly easy to modify the technique to whatever structure you choose.


@uniqarr = (
  { error => 0, goodrecordno => 6123, },
  { error => 0, goodrecordno => 6143, },
  { error => 1, goodrecordno => 10245, },
  { error => 1, goodrecordno => 10678, },
  { error => 1, goodrecordno => 10698, },
  { error => 2, goodrecordno => 16245, },
  { error => 2, goodrecordno => 16123, },
);

Then to extract each error instance with the lowest goodrecordno we can do the following.

First we import min from List::Util. This module is core Perl and doesn't require CPAN.

Then restructure the input @uniqarr. It's much easier for what we want to group by error values. So by_error is a hash of arrays. The key of the hash is the error value, the array contains all the goodrecordno values.

Finally we produce the desired output. Looping through the hash means we are iterating over each error value, sorted to provide the correct output ordering. Then we extract the minimum goodrecordno value. Which just leaves printing the output.


use List::Util qw(min); # In core Perl, not CPAN

# Restructure input
my %by_error; # Hash with error as key, array of goodrecordno as value.
foreach (@uniqarr) {
  push @{$by_error{$_->{error}}}, $_->{goodrecordno};
}

# Output as desired
foreach my $error (sort keys %by_error) {
  my $min_no = min @{$by_error{$error}};
  print "error $error goodrecordno:$min_no\n";
}

lod 2010-09-09 01:00:26

ansaurus

tags:

views:

answers:

How can I remove duplicates and sorting at the same time in Perl?

related questions