tags:

views:

102

answers:

2

Is there any way to force Perl to call FETCHSIZE on a tied array before each call to FETCH? My tied array knows its maximum size, but could shrink from this size depending on the results of earlier FETCH calls. here is a contrived example that filters a list to only the even elements with lazy evaluation:

use warnings;
use strict;

package VarSize;

sub TIEARRAY { bless $_[1] => $_[0] }
sub FETCH {
    my ($self, $index) = @_;
    splice @$self, $index, 1 while $$self[$index] % 2;
    $$self[$index]
}
sub FETCHSIZE {scalar @{$_[0]}}

my @source = 1 .. 10;

tie my @output => 'VarSize', [@source];

print "@output\n";  # array changes size as it is read, perl only checks size
                    # at the start, so it runs off the end with warnings
print "@output\n";  # knows correct size from start, no warnings

for brevity I have omitted a bunch of error checking code (such as how to deal with accesses starting from an index other than 0)

EDIT: rather than the above two print statements, if ONE of the following two lines is used, the first will work fine, the second will throw warnings.

print "$_ " for @output;   # for loop "iterator context" is fine,
                           # checks FETCHSIZE before each FETCH, ends properly

print join " " => @output; # however a list context expansion 
                           # calls FETCHSIZE at the start, and runs off the end

Update:

The actual module that implements a variable sized tied array is called List::Gen which is up on CPAN. The function is filter which behaves like grep, but works with List::Gen's lazy generators. Does anyone have any ideas that could make the implementation of filter better?

(the test function is similar, but returns undef in failed slots, keeping the array size constant, but that of course has different usage semantics than grep)

+1  A: 
sub FETCH {
    my ($self, $index) = @_;
    my $size = $self->FETCHSIZE;
    ...
}

Ta da!

I suspect what you're missing is they're just methods. Methods called by tie magic, but still just methods you can call yourself.

Listing out the contents of a tied array basically boils down to this:

my @array;
my $tied_obj = tied @array;
for my $idx (0..$tied_obj->FETCHSIZE-1) {
    push @array, $tied_obj->FETCH($idx);
}

return @array;

So you don't get any opportunity to control the number of iterations. Nor can FETCH reliably tell if its being called from @array or $array[$idx] or @array[@idxs]. This sucks. Ties kinda suck, and they're really slow. About 3 times slower than a normal method call and 10 times than a regular array.

Your example already breaks expectations about arrays (10 elements go in, 5 elements come out). What happen when a user asks for $array[3]? Do they get undef? Alternatives include just using the object API, if your thing doesn't behave exactly like an array pretending it does will only add confusion. Or you can use an object with array deref overloaded.

So, what you're doing can be done, but its difficult to get it to work well. What are you really trying to accomplish?

Schwern
that will get the size for use inside of FETCH, but the first call to `print` in my example will still throw warnings with this change. i tried returning a tied scalar from FETCHSIZE, but Perl still only calls it once at the start of the join
Eric Strom
The idea of changing the contents of an array on read is generally a bad idea, but I can't get more specific without knowing what it is you're really trying to do. Its difficult for a tied array to know if an individual index has been accessed or the whole thing (it would be a lot more efficient if there was a single method for the whole thing) or just a slice. Honestly, I would change it on store.
Schwern
i agree that its a suboptimal solution, i am working on an lazy version of grep that returns a tied array that only performs the minimum number of element tests to accurately return its result. for example, you have a list of a thousand items, apply the lazy grep to it, and then take a slice of the result from 2 .. 5, the tied array will walk through its source until there are 6 matching elements and then return the last 4. any suggestions of a different way to approach the problem?
Eric Strom
@Eric Return an iterator object. It'll be a lot less painful and efficient than dealing with tied arrays. Instead of `for my $thing (@list)` it'll be `while( my $thing = $iter->next )`. If you still want to go for tied arrays look at Tie::File for inspiration, it solves a similar problem.
Schwern
my module already implements a `->next` style iterative interface, and for other functions the tied interface is working well, its just the grep implementation that isn't working as cleanly as i would like. I have an early version up on cpan: List::Gen
Eric Strom
So you've reimplemented `Data::Stream::Bulk`?
jrockway
@jrockway => the core ideas have similarities, but the implementation is very different, for example, `List::Gen` provides lazy evaluation with random access within its objects
Eric Strom
A: 

I think that order in which perl calls FETCH/FETCHSIZE methods can't be changed. It's perls internal part. Why not just explicitly remove warnings:

sub FETCH {
    my ($self, $index) = @_;
    splice @$self, $index, 1 while ($$self[$index] || 0) % 2;
    exists $$self[$index] ? $$self[$index] : '' ## replace '' with default value
}
Ivan Nevostruev
that does work for the case of only printing the values, but i am worried about that placeholder leading to action at a distance when a copied array silently ends up with empty strings at the end. of course the explicit solution is if you need the whole array, to just to ask the array for a value past the end, forcing it to calculate all the relevant values first, but i am hoping for an implicit solution
Eric Strom
If you were worried about action at a distance, reading an array would not cause its length to change.
jrockway
@jrockway => I agree, the behavior of this one function is an anomaly in my module, hence the question :)
Eric Strom