views:

77

answers:

3

How can I sort two arrays of coordinates in numerical order by the start coordinates e.g.

my @starts = (100,100,200,300,400,500,525);
my @ends   = (150,125,250,350,450,550,550);

but choose the biggest difference if there are two matching in either the starts or ends list? E.g.

my @uniq_starts = (100,200,300,400,500);
my @unique_ends = (150,250,350,450,550);

Any help greatly appreciated!

Also, how about if the lists are like this?

my @starts = (100,125,200,300,400,500,525);
my @ends   = (150,175,250,350,450,550,550);

This would give me the following for the in between values:

-25, 25, 50, 50, 50, -25

I would need the following output:

my @uniq_starts = (100,200,300,400,500);
my @unique_ends = (175,250,350,450,550);

So my in between values are:

25, 50, 50, 50

I can get around this by just removing and ignoring any negative values, as I can imagine this would make things much more complicated.

+1  A: 

How about using Set::IntSpan?

use Set::IntSpan;

my @starts = (100,100,200,300,400,500,525);
my @ends = (150,125,250,350,450,550,550);
my @spec = map { "$starts[$_]-$ends[$_]" } 0..$#starts;
my $p = Set::IntSpan->new(@spec);
print "$p\n";
runrig
Doesn't work if there is a matching pair in the second list, from what I can tell?
Steve
E.g. (104643,104837)(104891,104950)(104912,105195)(104964,105195)(106635,107245)(112029,112166)(113174,113177)(114546,114688)(114773,114940)(115057,115165)
Steve
Prints both out for the ones ending 105195!
Steve
@Steve: right. Missed part of the spec. Updating my answer completely.
runrig
Thanks runrig! Set::IntSpan sounds interesting. Will have a read :)
Steve
Updated answer.
runrig
+1 Good idea to use Set::IntSpan. You could create `@spec` a bit more easily with `map()`, along these lines: `my @spec = map "$starts[$_]-$ends[$_]", 0 .. $#starts`.
FM
@FM: yes, nice. Updated.
runrig
A: 

Use some list transformations:

my @starts = (100,100,200,300,400,500,525);
my @ends   = (150,125,250,350,450,550,550);

my (%starts_seen, %ends_seen);
my @ar = sort { $a->[0] <=> $b->[0] }   # result in ascending sort order of @starts
         grep ! $starts_seen{$_->[0]}++,
         sort { $b->[0] <=> $a->[0] }   # descending sort b -> a
         grep ! $ends_seen{$_->[1]}++,
         sort { $b->[1] <=> $b->[1] }   # descending sort b -> a
         map  [ $starts[$_],$ends[$_] ],
         0 .. $#starts;

print "($_->[0],$_->[1]) " for @ar;

this results in:

(100,150) (200,250) (300,350) (400,450) (500,550) 

Regards

rbo

Edit: modified code to reflect sort order of sorts

rubber boots
How about if I wanted to get 100,150 as the first pair, so the pairs are always the largest ones?
Steve
Steve, you use in sort: {$a->[0] <=> $b->[0]} for ascending and $b->[0] <=> $a->[0]} for descending sort (I'll add a remark)
rubber boots
It is slow and overcomplicated. Why you sort it three times? Compare poor man's solution in http://stackoverflow.com/questions/3449983/sorting-arrays-of-paired-numbers-and-removing-duplicates-or-overlaps/3451206#3451206
Hynek -Pichi- Vychodil
What if you had starts (500, 550) and ends (525, 560)...do we want to end up with just (500, 560)? This does not do that.
runrig
runrig, right, this will **not** correctly handle int spans. This was not originally stated in the question - I read that (the span request) later in an additional post of the o.p. -
rubber boots
+1  A: 

Using Set::IntSpan:

use Set::IntSpan;

my @starts = (100,100,200,300,400,500,525);
my @ends   = (150,125,250,350,450,550,550);

my (@uniq_starts, @unique_ends);

for my $s (Set::IntSpan->new([map [$starts[$_], $ends[$_]], 0 .. $#starts])->spans) {
  push @uniq_starts, $s->[0];
  push @uniq_ends, $s->[1];
}

print join(",", @uniq_starts), "\n";
print join(",", @uniq_ends), "\n";

Or poor man's solution:

sub spans {
  my @s = sort {$a->[0] <=> $b->[0] or $a->[1] <=> $b->[1]} @_;
  my @res;
  while (@s > 1) {
    if ($s[0][1] >= $s[1][0]) {
      splice @s, 0, 2, [$s[0][0], $s[1][1]];
    } else {
      push @res, shift @s;
    }
  }
  push @res, @s;
  return @res;
}

my @starts = (100,100,200,300,400,500,525);
my @ends   = (150,125,250,350,450,550,550);

my (@uniq_starts, @unique_ends);

for my $s (spans(map [$starts[$_], $ends[$_]], 0 .. $#starts)) {
  push @uniq_starts, $s->[0];
  push @uniq_ends, $s->[1];
}

print join(",", @uniq_starts), "\n";
print join(",", @uniq_ends), "\n";

You can check that it works flawless.

More functional spans version:

sub spans {
  return spans_(sort {$a->[0] <=> $b->[0] or $a->[1] <=> $b->[1]} @_);
}

sub spans_ {
  if (@_ > 1 and $_[0][1] >= $_[1][0]) {
    splice @_, 0, 2, [$_[0][0], $_[1][1]];
    goto &spans_;
  } elsif (@_) {
    return shift, spans_(@_);
  } else {
    return;
  }
}

P.S.: If somebody thinks that perl is concise language, compare same algorithm spans function in erlang. I don't even know how it would look in APL or J:

spans(L) -> spans_(lists:sort(L)).

spans_([{A, B}, {C, D}|T]) when B >= C ->
  spans_([{A, D}|T]);
spans_([H|T]) -> [H|spans_(T)];
spans_([]) -> [].
Hynek -Pichi- Vychodil