views:

132

answers:

4

I have a function, binary_range_search, that is called like so:

my $brs_iterator = binary_range_search(
    target => $range,                   # eg. [1, 200]
    search => $ranges                   # eg. [ {start => 1,   end => 1000},
);                                      #       {start => 500, end => 1500} ]

brs_iterator->() will iterate over all @$ranges on which $range overlaps.

I would like to extend binary_range_search to be able to call it with multiple ranges as its target, eg:

target => $target_ranges # eg. [ [1, 200], [50, 300], ... ]
search => $search_ranges # as above

So, when the search on $range->[0] is exhausted, it should move on to $range->[1], and so on. Here is the function in question, in its original form:

sub binary_range_search {
    my %options = @_;
    my $range    = $options{target}  || return;
    my $ranges   = $options{search}  || return;

    my ( $low, $high ) = ( 0, @{$ranges} - 1 );

    while ( $low <= $high ) {

        my $try = int( ( $low + $high ) / 2 );

        $low  = $try + 1, next if $ranges->[$try]{end}   < $range->[0];
        $high = $try - 1, next if $ranges->[$try]{start} > $range->[1];

        my ( $down, $up ) = ($try) x 2;

        my %seen = ();

        my $brs_iterator = sub {

            if (    $ranges->[ $up + 1 ]{end}       >= $range->[0]
                    and $ranges->[ $up + 1 ]{start} <= $range->[1]
                    and !exists $seen{ $up + 1 } )
            {
                $seen{ $up + 1 } = undef;
                return $ranges->[ ++$up ];
            }
            elsif ( $ranges->[ $down - 1 ]{end}       >= $range->[0]
                    and $ranges->[ $down + 1 ]{start} <= $range->[1]
                    and !exists $seen{ $down - 1 }
                    and $down > 0 )
            {
                $seen{ $down - 1 } = undef;
                return $ranges->[ --$down ];
            }
            elsif ( !exists $seen{$try} ) {
                $seen{$try} = undef;
              return $ranges->[$try];
            }
            else {
                return;
            }

        };
        return $brs_iterator;
    }
    return sub { };
}

It's a standard binary search strategy, until it finds an overlapping range. It then moves on the right, exhausts it, moves on the left, exhausts it, and finally gives up. Ideally, it should then maybe shift the next target range, and redo the search, I suppose (perhaps via recursion?). My problem is, I am not sure how to make that work with the iterator construction.

A: 

Split it into two functions, an outer function that loops over the ranges and calls an inner function that implements a conventional binary chop.

Marcelo Cantos
That would work for a straight binary search problem, but would it work when you're not actually returning values, but a reference to a function? The iterator needs to iterate over *all* values, not just find the first one.
Pedro Silva
A: 

Warning: a very c++ biased answer:

what you have to do is define a new type of iterator, which is a pair of a usual iterator, and a segmemt iterrator (if you don't have a segment iterator, it's a pair of a const pointer / ref to the segments, and an index pointing to the correct segment). You have to define all the concepts of a random access iterator (difference, addition of integer, etc). bear in mind, that at least in c++ lingo this is not a true random iterator, since addition of an integer is not really constant time; such is life.

David Lehavi
+2  A: 

If you're wanting to iterate over all values that overlap the search ranges, you don't need binary search.

First the customary front matter:

use warnings;
use strict;

use Carp;

First off, check that we have target and search parameters and that for each range, the starting point is no greater than its ending point. Otherwise, we refuse to proceed.

sub binary_range_search {
  my %arg = @_;

  my @errors;
  my $target = $arg{target} || push @errors => "no target";
  my $search = $arg{search} || push @errors => "no search";

  for (@$target) {
    my($start,$end) = @$_;
    push @errors => "Target start ($start) is greater than end ($end)"
      if $start > $end;
  }

  for (@$search) {
    my($start,$end) = @{$_}{qw/ start end /};
    push @errors => "Search start ($start) is greater than end ($end)"
      if $start > $end;
  }

  croak "Invalid use of binary_range_search:\n",
        map "  - $_\n", @errors
    if @errors;

The iterator itself is a closure that maintains the following state:

  my $i;
  my($ta,$tb);
  my($sa,$sb);
  my $si = 0;

where

  • $i if defined is the next value from the current overlapping range
  • $ta and $tb are the starting and ending points for the current target range
  • $sa and $sb are like the above but for the current search range
  • $si is an index into @$search and defines the current search range

We will be assigning and returning the iterator $it. The declaration and initialization are separate so the iterator may call itself when necessary.

  my $it;
  $it = sub {

We are done if no more target ranges remain or if there were no search ranges to begin with.

    return unless @$target && @$search;

When $i is defined, it means we have found an overlap and iterate by incrementing $i until it is greater than the ending point of either the current target range or the current search range.

    if (defined $i) {
      # iterating within a target range

      if ($i > $tb || $i > $sb) {
        ++$si;
        undef $i;
        return $it->();
      }
      else {
        return $i++;
      }
    }

Otherwise, we need to determine whether the next target range overlaps any search range. However, if $i is undefined and we've already considered all the search ranges, we discard the current target range and start again.

    else {
      # does the next target range overlap?

      if ($si >= @$search) {
        shift @$target;
        $si = 0;
        return $it->();
      }

Here we pull out the starting and ending points of both the current target range (always at the front of @$target) and the current search range (indexed by $si).

      ($ta,$tb) = @{ $target->[0] };
      ($sa,$sb) = @{ $search->[$si] }{qw/ start end /};

Now testing for overlap is straightforward. For disjoint search ranges, we ignore and move on. Otherwise, we find the leftmost point in the overlap and iterate from there.

      if ($sb < $ta || $sa > $tb) {
        # disjoint
        ++$si;
        undef $i;
        return $it->();
      }
      elsif ($sa >= $ta) {
        $i = $sa;
        return $i++;
      }
      elsif ($ta >= $sa) {
        $i = $ta;
        return $i++;
      }
    }
  };

Finally, we return the iterator:

  $it;
}

For an example similar to the one in your question

my $it = binary_range_search(
  target => [ [1, 200], [50, 300] ],
  search => [ { start =>   1, end => 1000 },
              { start => 500, end => 1500 },
              { start =>  40, end =>   60 },
              { start => 250, end =>  260 } ],
);

while (defined(my $value = $it->())) {
  print "got $value\n";
}

Its output with internal points elided is

got 1
[...]
got 200
got 40
[...]
got 60
got 50
[...]
got 300
got 50
[...]
got 60
got 250
[...]
got 260
Greg Bacon
+1 for the effort and completeness. One thought: my previous iteration (no pun intended) of the above program actually did something like this: sort of a linear search while keeping tabs on which ranges had been seen. I suppose I can profile your version and my brs one, but I'm curious: which do you think has less complexity? It seems to me that binary search, even on multiple targets, will still be O(log n). Yours seems more complex, computationally and to my brain on not enough coffee. In any case, thanks for your help!
Pedro Silva
+2  A: 

I just wrapped your iterator generation in a for loop, and built up an array of iterator functions.

Depending on context, I either return a master iterator or a list of iterator functions. I wasn't sure what you wanted.

use strict;
use warnings;


my $t = [ [1,200], [400,900] ];
my @r = (
    { start =>   1, end =>  100 },
    { start =>   2, end =>  500 },
    { start => 204, end =>  500 },
    { start => 208, end =>  500 },
    { start => 215, end => 1000 },
    { start => 150, end => 1000 },
    { start => 500, end => 1100 },
);

# Get a master iterator that will process each iterator in turn.
my $brs_iterator = binary_range_search(
    targets => $t,  
    search => \@r,
);

# Get an array of iterators
my @brs_iterator = binary_range_search(
    targets => $t,  
    search => \@r,
);



sub binary_range_search {
    my %options = @_;
    my $targets = $options{targets}  || return;
    my $ranges  = $options{search}  || return;


    my @iterators;

    TARGET:
    for my $target ( @$targets ) {

        my ( $low, $high ) = ( 0, $#{$ranges} );

        RANGE_CHECK:
        while ( $low <= $high ) {

            my $try = int( ( $low + $high ) / 2 );

            # Remove non-overlapping ranges
            $low  = $try + 1, next RANGE_CHECK 
                if $ranges->[$try]{end}   < $target->[0];

            $high = $try - 1, next RANGE_CHECK 
                if $ranges->[$try]{start} > $target->[1];

            my ( $down, $up ) = ($try) x 2;

            my %seen = ();

            my $brs_iterator = sub {

                if (    exists $ranges->[$up + 1]
                        and $ranges->[ $up + 1 ]{end}   >= $target->[0]
                        and $ranges->[ $up + 1 ]{start} <= $target->[1]
                        and !exists $seen{ $up + 1 } )
                {
                    $seen{ $up + 1 } = undef;
                    return $ranges->[ ++$up ];
                }
                elsif ( $ranges->[ $down - 1 ]{end}       >= $target->[0]
                        and $ranges->[ $down + 1 ]{start} <= $target->[1]
                        and !exists $seen{ $down - 1 }
                        and $down > 0 )
                {
                    $seen{ $down - 1 } = undef;
                    return $ranges->[ --$down ];
                }
                elsif ( !exists $seen{$try} ) {
                    $seen{$try} = undef;
                  return $ranges->[$try];
                }
                else {
                    return;
                }

            };
            push @iterators, $brs_iterator;
            next TARGET;
        }

    }

    # In scalar context return master iterator that iterates over the list of range iterators.
    # In list context returns a list of range iterators.
    return wantarray 
         ? @iterators 
         : sub { 
             while( @iterators ) {
                 if( my $range = $iterators[0]() ) {
                     return $range;
                 }
                 shift @iterators;
             }
             return;
        }; 
}
daotoad
That is almost perfect; it never occurred to me to just push iterators onto a stack. The thing, though, is the iterator returned by this function is then used as input to an accumulator function. I can, of course, change the accumulator to reuse multiple iterators. Thanks!.
Pedro Silva