ansaurus

Question

How can I improve Perl's performance with massive data?

Answer 1

+3 A:

A very simple way to improve your performance there entails simply learning better coding practices.

One tenet you are ignoring is this: Do not repeat yourself.

You repeat certain bits of code in there a lot, forcing Perl to evaluate the same expression again and again. An example for this is this string:

"$tasks_name[$i]\_sort\_allvals"

It's used about 10 times in there. That means every single time you use it perl refers to the array, in case it changed, and puts together that string. It may not seem much, but eventually it adds up.

Another example is this:

$values{"$tasks_name[$i]\_sort\_allvals"}->[$k]

It's used 10 times as well and while $k actually changes each loop, for each run through the loop the value of that whole expression is the same. It would be faster to store it in a single scalar at the beginning of the loop and then use that scalar throughout the rest of it, as you avoid forcing perl to resolve a reference 10 times per loop.

Mithaldu 2010-01-07 12:07:14

An algorithm redesign sounds more important, but, yech, I'd do this just for readability!

fennec 2010-01-07 13:33:27

Agreed. But sometimes it just helps to give salient reasons beyond that. :)

Mithaldu 2010-01-07 16:38:26

Thanks for your comment. I'll try to change it and keep it for next time :)

YoDar 2010-01-08 09:02:44

Answer 2

+3 A:

I'm having a little trouble following it, but it sounds like you're searching a (conveniently presorted) array of datafile-X-points from the start for my $var (0..whatever) { ... last if $done; } repeatedly for each axis-X-point.

Total Shlemiel the Painter algorithm. This is probably the source of the performance problem. You should try avoiding parts of the search you don't need to do again.

fennec 2010-01-07 13:39:14

Nice algorithm ! I'm not quite sure I've understand the connections to my algorithm. thanks for your comment :)

YoDar 2010-01-08 09:04:29

Answer 3

A:

I tried to understand your program but my brain choked a bit. Can you provides us an example input, some explanation of what you want to do, and good example output? From your example I can't make head or tails.

You say this is some valid input:

database: task: TEST

database: binary file size: 20

database: numbers of start/end values = 5

database: sorted I.P array (X axis):

16,16,16,100,200,255,255,255,355,455

database: counting active cores in sorted I.P array (size: 10)...

and this is valid output:

database: sorted cores array (Y axis):

0,3,4,5,5,2,1,0

but I don't understand it. Care to explain what you want to achieve?

Leonardo Herrera 2010-01-07 18:54:14

I've added explanations for the example...

YoDar 2010-01-08 09:01:40

Answer 4

A:

Based on what you say here:

In order to create the Y axis data points, I made up with some algorithm that run all over the X_sort_array points and check for each point if it is between all of start-end points individuality . if it is add one for Y_array[x_point] and so on...

and assuming this is your problem, then a modified binary search algorithm will work in O(log n), or approximately 6 steps for n = 1_000_000:

sub binary_range_search {
    my ( $range, $ranges ) = @_;
    my ( $low, $high ) = ( 0, @{$ranges} - 1 );
    while ( $low <= $high ) {

        my $try = int( ( $low + $high ) / 2 );

        $low  = $try + 1, next if $ranges->[$try][1] < $range->[0];
        $high = $try - 1, next if $ranges->[$try][0] > $range->[1];

        return $ranges->[$try];
    }
    return;
}

In this version, you're looking to find whether the range @$range overlaps any of the ranges in @$ranges. You can modify it to look for overlaps of a single point over all ranges. Is that what you're looking for?

Pedro Silva 2010-01-08 09:10:51

I think this might be what I've been looking for. Still I need to make some upgrades to use it right. I know X_sort_array must be sorted in order to use this algorithm, but is the @$range need to be sorted also ?

YoDar 2010-01-11 07:27:59

No at all. @$range should be something like (1, 50) or (100, 100), in which case it will find the proper range where 100 is contained. The version I posted is just something I already had for the purpose of finding overlapping ranges.

Pedro Silva 2010-01-11 09:40:41

Is there a way to count how many @$ranges overlaps occur on a single point ?

YoDar 2010-01-17 14:57:42

Yeah, sure. Also using binary search, see my question: http://stackoverflow.com/questions/2046390/how-to-extend-a-binary-search-iterator-to-consume-multiple-targets and the accepted answer: http://stackoverflow.com/questions/2046390/how-to-extend-a-binary-search-iterator-to-consume-multiple-targets/2052468#2052468If you implement the `binary_range_search` iterator version, computing the number of overlaps would be a matter of exhausting the iterator through `$count++ while $brs_iterator->()`.

Pedro Silva 2010-01-19 23:54:43

ansaurus

tags:

views:

answers:

How can I improve Perl's performance with massive data?

related questions