tags:

views:

321

answers:

3

I built a Perl Inline::C module, but there is some oddity with the sorting. Does anyone know why it would sort like this? Why is the 4.0e-5 is not first?

my $ref = [ 5.0e-5,4.2e-5,4.3e-5,4.4e-5,4.4e-5,4.2e-5,4.2e-5,4.0e-5]; 

use Inline C => <<'END_OF_C_CODE';

void test(SV* sv, ...) {

  I32 i;
  I32 arrayLen;
  AV* data;
  float retval;
  SV** pvalue;

  Inline_Stack_Vars;
  data = SvUV(Inline_Stack_Item(0));

  /* Determine the length of the array */
  arrayLen = av_len(data);

  // sort 
  sortsv(AvARRAY(data),arrayLen+1,Perl_sv_cmp_locale);

  for (i = 0; i < arrayLen+1; i++) {

    pvalue = av_fetch(data,i,0);  /* fetch the scalar located at i .*/
    retval = SvNV(*pvalue);  /* dereference the scalar into a number. */

    printf("%f \n",newSVnv(retval));
  }
}

END_OF_C_CODE

test($ref);

0.000042
0.000042
0.000042
0.000043
0.000044
0.000044
0.000040
0.000050

+4  A: 

Because you are sorting lexically, Try this code:

#!/usr/bin/perl

use strict;
use warnings;

my $ref = [ 5.0e-5,4.2e-5,4.3e-5,4.4e-5,4.4e-5,4.2e-5,4.2e-5,4.0e-5]; 

print "Perl with cmp\n";
for my $val (sort @$ref) {
    printf "%f \n", $val;
}

print "Perl with <=>\n";
for my $val (sort { $a <=> $b } @$ref) {
    printf "%f \n", $val;
}

print "C\n";

test($ref);

use Inline C => <<'END_OF_C_CODE';

void test(SV* sv, ...) {

  I32 i;
  I32 arrayLen;
  AV* data;
  float retval;
  SV** pvalue;

  Inline_Stack_Vars;
  data = SvUV(Inline_Stack_Item(0));

  /* Determine the length of the array */
  arrayLen = av_len(data);

  // sort 
  sortsv(AvARRAY(data),av_len(data)+1,Perl_sv_cmp_locale);

  arrayLen = av_len(data);
  for (i = 0; i < arrayLen+1; i++) {

    pvalue = av_fetch(data,i,0);  /* fetch the scalar located at i .*/
    retval = SvNV(*pvalue);  /* dereference the scalar into a number. */

    printf("%f \n",newSVnv(retval));
  }
}

END_OF_C_CODE

Of course, lexically 0.00040 is smaller than 0.00042 as well, but you aren't comparing 0.00040 to 0.00042; you are comparing the number 0.00040 converted to a string with the number 0.00042 converted to a string. When a number gets too large or small, Perl's stringifying logic resorts to using scientific notation. So you are sorting the set of strings

"4.2e-05", "4.2e-05", "4.2e-05", "4.3e-05", "4.4e-05", "4.4e-05", "4e-05", "5e-05"

which are properly sorted. Perl happily turns those strings back into their numbers when you ask it to with the %f format in printf. You could stringify the numbers yourself, but since you have stated you want this to be faster, that would be a mistake. You should not to be trying to optimize the program before you know where it slow (premature optimization is the root of all evil*). Write your code then run Devel::NYTProf against it to find where it is slow. If necessary, rewrite those portions in XS or Inline::C (I prefer XS). You will find that you get more speed out of choosing the right data structure than micro-optimizations like this.

* Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.

Chas. Owens
I would like to have the sorting in the Inline::C section for speed. This may be the problem, but how to get sortsv() to sort numerically is not well documented.
Dan Littlejohn
You won't get any speed out of it. The sort function in Perl is written in C.
Chas. Owens
Now, if you mean the function you are _passing_ to the sort function, that may get a speed boost, but it would have to be pretty complicated.
Chas. Owens
+2  A: 

Perl_sv_cmp_locale is your sorting function which I suspect is lexical comparison. Look for numeric sorting one or write your own.

Hynek -Pichi- Vychodil
This is not well documented in the perl api. Even trying to try and write my own function no luck.
Dan Littlejohn
+1  A: 

Have an answer with help from the people over at http://www.perlmonks.org/?node_id=761015

I ran some profiling (DProf) and it's a 4x improvement in speed

Total Elapsed Time = 0.543205 Seconds
User+System Time = 0.585454 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
100. 0.590 0.490 100000 0.0000 0.0000 test_inline_c_pkg::percent2

Total Elapsed Time = 2.151647 Seconds
User+System Time = 1.991647 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
104. 2.080 1.930 100000 0.0000 0.0000 main::percent2

Here is the code

use Inline C => <<'END_OF_C_CODE';

#define SvSIOK(sv) ((SvFLAGS(sv) & (SVf_IOK|SVf_IVisUV)) == SVf_IOK)
#define SvNSIV(sv) (SvNOK(sv) ? SvNVX(sv) : (SvSIOK(sv) ? SvIVX(sv) : sv_2nv(sv)))

static I32 S_sv_ncmp(pTHX_ SV *a, SV *b) {

  const NV nv1 = SvNSIV(a);
  const NV nv2 = SvNSIV(b);
  return nv1 < nv2 ? -1 : nv1 > nv2 ? 1 : 0;
}

void test(SV* sv, ...) {

  I32 i;
  I32 arrayLen;
  AV* data;
  float retval;
  SV** pvalue;

  Inline_Stack_Vars;
  data = SvUV(Inline_Stack_Item(0));

  /* Determine the length of the array */
  arrayLen = av_len(data);

  /* sort descending (send numerical sort function S_sv_ncmp) */
  sortsv(AvARRAY(data),arrayLen+1, S_sv_ncmp);

  for (i = 0; i < arrayLen+1; i++) {

    pvalue = av_fetch(data,i,0);  /* fetch the scalar located at i .*/
    retval = SvNV(*pvalue);  /* dereference the scalar into a number. */

    printf("%f \n",newSVnv(retval));
  }
}

END_OF_C_CODE
Dan Littlejohn