ansaurus

Question

Sorting arrays of paired numbers and removing duplicates or overlaps?

Answer 1

+1 A:

use Set::IntSpan;

my @starts = (100,100,200,300,400,500,525);
my @ends = (150,125,250,350,450,550,550);
my @spec = map { "$starts[$_]-$ends[$_]" } 0..$#starts;
my $p = Set::IntSpan->new(@spec);
print "$p\n";

runrig 2010-08-10 14:53:33

Doesn't work if there is a matching pair in the second list, from what I can tell?

Steve 2010-08-10 15:23:38

E.g. (104643,104837)(104891,104950)(104912,105195)(104964,105195)(106635,107245)(112029,112166)(113174,113177)(114546,114688)(114773,114940)(115057,115165)

Steve 2010-08-10 15:24:12

Prints both out for the ones ending 105195!

Steve 2010-08-10 15:24:32

@Steve: right. Missed part of the spec. Updating my answer completely.

runrig 2010-08-10 15:30:02

Thanks runrig! Set::IntSpan sounds interesting. Will have a read :)

Steve 2010-08-10 15:39:18

Updated answer.

runrig 2010-08-10 16:34:05

+1 Good idea to use Set::IntSpan. You could create `@spec` a bit more easily with `map()`, along these lines: `my @spec = map "$starts[$_]-$ends[$_]", 0 .. $#starts`.

FM 2010-08-10 16:47:16

@FM: yes, nice. Updated.

runrig 2010-08-10 17:42:38

Answer 2

A:

Use some list transformations:

my @starts = (100,100,200,300,400,500,525);
my @ends   = (150,125,250,350,450,550,550);

my (%starts_seen, %ends_seen);
my @ar = sort { $a->[0] <=> $b->[0] }   # result in ascending sort order of @starts
         grep ! $starts_seen{$_->[0]}++,
         sort { $b->[0] <=> $a->[0] }   # descending sort b -> a
         grep ! $ends_seen{$_->[1]}++,
         sort { $b->[1] <=> $b->[1] }   # descending sort b -> a
         map  [ $starts[$_],$ends[$_] ],
         0 .. $#starts;

print "($_->[0],$_->[1]) " for @ar;

this results in:

(100,150) (200,250) (300,350) (400,450) (500,550)

Regards

rbo

Edit: modified code to reflect sort order of sorts

rubber boots 2010-08-10 14:55:15

How about if I wanted to get 100,150 as the first pair, so the pairs are always the largest ones?

Steve 2010-08-10 15:46:30

Steve, you use in sort: {$a->[0] <=> $b->[0]} for ascending and $b->[0] <=> $a->[0]} for descending sort (I'll add a remark)

rubber boots 2010-08-10 16:02:57

It is slow and overcomplicated. Why you sort it three times? Compare poor man's solution in http://stackoverflow.com/questions/3449983/sorting-arrays-of-paired-numbers-and-removing-duplicates-or-overlaps/3451206#3451206

Hynek -Pichi- Vychodil 2010-08-10 16:33:50

What if you had starts (500, 550) and ends (525, 560)...do we want to end up with just (500, 560)? This does not do that.

runrig 2010-08-10 16:38:09

runrig, right, this will **not** correctly handle int spans. This was not originally stated in the question - I read that (the span request) later in an additional post of the o.p. -

rubber boots 2010-08-10 17:20:52

Answer 3

+1 A:

Using Set::IntSpan:

use Set::IntSpan;

my @starts = (100,100,200,300,400,500,525);
my @ends   = (150,125,250,350,450,550,550);

my (@uniq_starts, @unique_ends);

for my $s (Set::IntSpan->new([map [$starts[$_], $ends[$_]], 0 .. $#starts])->spans) {
  push @uniq_starts, $s->[0];
  push @uniq_ends, $s->[1];
}

print join(",", @uniq_starts), "\n";
print join(",", @uniq_ends), "\n";

Or poor man's solution:

sub spans {
  my @s = sort {$a->[0] <=> $b->[0] or $a->[1] <=> $b->[1]} @_;
  my @res;
  while (@s > 1) {
    if ($s[0][1] >= $s[1][0]) {
      splice @s, 0, 2, [$s[0][0], $s[1][1]];
    } else {
      push @res, shift @s;
    }
  }
  push @res, @s;
  return @res;
}

my @starts = (100,100,200,300,400,500,525);
my @ends   = (150,125,250,350,450,550,550);

my (@uniq_starts, @unique_ends);

for my $s (spans(map [$starts[$_], $ends[$_]], 0 .. $#starts)) {
  push @uniq_starts, $s->[0];
  push @uniq_ends, $s->[1];
}

print join(",", @uniq_starts), "\n";
print join(",", @uniq_ends), "\n";

You can check that it works flawless.

More functional spans version:

sub spans {
  return spans_(sort {$a->[0] <=> $b->[0] or $a->[1] <=> $b->[1]} @_);
}

sub spans_ {
  if (@_ > 1 and $_[0][1] >= $_[1][0]) {
    splice @_, 0, 2, [$_[0][0], $_[1][1]];
    goto &spans_;
  } elsif (@_) {
    return shift, spans_(@_);
  } else {
    return;
  }
}

P.S.: If somebody thinks that perl is concise language, compare same algorithm spans function in erlang. I don't even know how it would look in APL or J:

spans(L) -> spans_(lists:sort(L)).

spans_([{A, B}, {C, D}|T]) when B >= C ->
  spans_([{A, D}|T]);
spans_([H|T]) -> [H|spans_(T)];
spans_([]) -> [].

Hynek -Pichi- Vychodil 2010-08-10 16:29:10

ansaurus

tags:

views:

answers:

Sorting arrays of paired numbers and removing duplicates or overlaps?

related questions