views:

617

answers:

6

I've written this very badly optimized C code that does a simple math calculation:

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#define MIN(a, b) (((a) < (b)) ? (a) : (b))
#define MAX(a, b) (((a) > (b)) ? (a) : (b))


unsigned long long int p(int);
float fullCheck(int);

int main(int argc, char **argv){
  int i, g, maxNumber;
  unsigned long long int diff = 1000;

  if(argc < 2){
    fprintf(stderr, "Usage: %s maxNumber\n", argv[0]);
    return 0;
  }
  maxNumber = atoi(argv[1]);

  for(i = 1; i < maxNumber; i++){
    for(g = 1; g < maxNumber; g++){
      if(i == g)
        continue;
      if(p(MAX(i,g)) - p(MIN(i,g)) < diff &&  fullCheck(p(MAX(i,g)) - p(MIN(i,g))) && fullCheck(p(i) + p(g))){
          diff = p(MAX(i,g)) - p(MIN(i,g));
          printf("We have a couple %llu %llu with diff %llu\n", p(i), p(g), diff);
      }
    }
  }

  return 0;
}

float fullCheck(int number){
  float check = (-1 + sqrt(1 + 24 * number))/-6;
  float check2 = (-1 - sqrt(1 + 24 * number))/-6;
  if(check/1.00 == (int)check)
    return check;
  if(check2/1.00 == (int)check2)
    return check2;
  return 0;
}

unsigned long long int p(int n){
  return n * (3 * n - 1 ) / 2;
}

And then I've tried (just for fun) to port it under Python to see how it would react. My first version was almost a 1:1 conversion that run terribly slow (120+secs in Python vs <1sec in C). I've done a bit of optimization, and this is what I obtained:

#!/usr/bin/env/python
from cmath import sqrt
import cProfile
from pstats import Stats

def quickCheck(n):
        partial_c = (sqrt(1 + 24 * (n)))/-6 
        c = 1/6 + partial_c
        if int(c.real) == c.real:
                return True
        c = c - 2*partial_c
        if int(c.real) == c.real:
                return True
        return False

def main():        
        maxNumber = 5000
        diff = 1000
        for i in range(1, maxNumber):
                p_i = i * (3 * i - 1 ) / 2
                for g in range(i, maxNumber):
                        if i == g:
                                continue
                        p_g = g * (3 * g - 1 ) / 2
                        if p_i > p_g:
                                ma = p_i
                                mi = p_g
                        else:
                                ma = p_g
                                mi = p_i

                        if ma - mi < diff and quickCheck(ma - mi):
                                if quickCheck(ma + mi):
                                        print ('New couple ', ma, mi)
                                        diff = ma - mi


cProfile.run('main()','script_perf')
perf = Stats('script_perf').sort_stats('time', 'calls').print_stats(10)

This runs in about 16secs which is better but also almost 20 times slower than C. Now, I know C is better than Python for this kind of calculations, but what I would like to know is if there something that I've missed (Python-wise, like an horribly slow function or such) that could have made this function faster. Please note that I'm using Python 3.1.1, if this makes a difference

+12  A: 

Since quickCheck is being called close to 25,000,000 times, you might want to use memoization to cache the answers.

You can do memoization in C as well as Python. Things will be much faster in C, also.

You're computing 1/6 in each iteration of quickCheck. I'm not sure if this will be optimized out by Python, but if you can avoid recomputing constant values, you'll find things are faster. C compilers do this for you.

Doing things like if condition: return True; else: return False is silly -- and time consuming. Simply do return condition.

In Python 3.x, /2 must create floating-point values. You appear to need integers for this. You should be using //2 division. It will be closer to the C version in terms of what it does, but I don't think it's significantly faster.

Finally, Python is generally interpreted. The interpreter will always be significantly slower than C.

S.Lott
Seth
How's it go..? "You'll write faster code in C, but code faster in Python"
MattH
+2  A: 

The other respondents have already mentioned several optimizations that will help. However, ultimately, you're not going to be able to match the performance of C in Python. Python is a nice tool, but since it's interpreted, it isn't really suited for heavy number crunching or other apps where performance is key.

Also, even in your C version, your inner loop could use quite a bit of help. Updated version:

  for(i = 1; i < maxNumber; i++){
    for(g = 1; g < maxNumber; g++){
      if(i == g)
        continue;
      max=i;
      min=g;

      if (max<min) { 
          // xor swap - could use swap(p_max,p_min) instead. 
          max=max^min;
          min=max^min;
          max=max^min;
      }

      p_max=P(max);
      p_min=P(min);
      p_i=P(i);
      p_g=P(g);

      if(p_max - p_min < diff &&  fullCheck(p_max-p_min) && fullCheck(p_i + p_g)){
          diff = p_max - p_min;
          printf("We have a couple %llu %llu with diff %llu\n", p_i, p_g, diff);
      }
    }
  }


///////////////////////////
float fullCheck(int number){
  float den=sqrt(1+24*number)/6.0;
  float check = 1/6.0 - den;
  float check2 = 1/6.0 + den;


  if(check == (int)check)
    return check;
  if(check2 == (int)check2)
    return check2;

  return 0.0;
}

Division, function calls, etc are costly. Also, calculating them once and storing in vars such as I've done can make things a lot more readable.

You might consider declaring P() as inline or rewrite as a preprocessor macro. Depending on how good your optimizer is, you might want to perform some of the arithmetic yourself and simplify its implementation.

Your implementation of fullCheck() would return what appear to be invalid results, since 1/6==0, where 1/6.0 would return 0.166... as you would expect.

This is a very brief take on what you can do to your C code to improve performance. This will, no doubt, widen the gap between C and Python performance.

David Lively
Python isn't necessarily interpreted.
Chuck
@Chuck ack. good point.
David Lively
+1  A: 

20x difference between Python and C for a number crunching task seems quite good to me.

Check the usual performance differences for some CPU intensive tasks (keep in mind that the scale is logarithmic).

But look on the bright side, what's 1 minute of CPU time compared with the brain and typing time you saved writing Python instead of C? :-)

fortran
+4  A: 

There are some python compilers that might actually do a good bit for you. Have a look at Psyco.

Another way of dealing with math intensive programs is to rewrite the majority of the work into a math kernel, such as NumPy, so that heavily optimized code is doing the work, and your python code only guides the calculation. To get the most out of this strategy, avoid doing calculations in loops, and instead let the math kernel do all of that.

TokenMacGuy
+1: Don't transliterate C; rethink it into numpy.
S.Lott
+7  A: 

I made it go from ~7 seconds to ~3 seconds on my machine:

  • Precomputed i * (3 * i - 1 ) / 2 for each value, in yours it was computed twice quite a lot
  • Cached calls to quickCheck
  • Removed if i == g by adding +1 to the range
  • Removed if p_i > p_g since p_i is always smaller than p_g

Also put the quickCheck-function inside main, to make all variables local (which have faster lookup than global). I'm sure there are more micro-optimizations available.

def main():
        maxNumber = 5000
        diff = 1000

        p = {}
        quickCache = {}

        for i in range(maxNumber):
            p[i] = i * (3 * i - 1 ) / 2

        def quickCheck(n):
            if n in quickCache: return quickCache[n]
            partial_c = (sqrt(1 + 24 * (n)))/-6 
            c = 1/6 + partial_c
            if int(c.real) == c.real:
                    quickCache[n] = True
                    return True
            c = c - 2*partial_c
            if int(c.real) == c.real:
                    quickCache[n] = True
                    return True
            quickCache[n] = False
            return False

        for i in range(1, maxNumber):
                mi = p[i]
                for g in range(i+1, maxNumber):
                        ma = p[g]
                        if ma - mi < diff and quickCheck(ma - mi) and quickCheck(ma + mi):
                                print('New couple ', ma, mi)
                                diff = ma - mi
truppo
I was about to post almost that same list of optimizations, with about the same relative speedup. Caching the "i * (3 * i - 1 ) / 2" calculation, and removing the "if i == g by adding +1 to the range" were the two biggest improvements. The calls to quickCheck were not repeated as much as I thought, but it is still a good idea to cache the values.
semiuseless
Nice one, I've already tried to incorporate quickCheck into the main body, but the advantage was slow when compared to a more complicated code.
Nanomad
Gave it a whirl with PyPy trunk. About 3x faster than python 2.6 and about 3x slower than the C version compiled with GCC 4.4 -O3.
Ants Aasma
+2  A: 

Because the function p() monotonically increasing you can avoid comparing the values as g > i implies p(g) > p(i). Also, the inner loop can be broken early because p(g) - p(i) >= diff implies p(g+1) - p(i) >= diff.

Also for correctness, I changed the equality comparison in quickCheck to compare difference against an epsilon because exact comparison with floating point is pretty fragile.

On my machine this reduced the runtime to 7.8ms using Python 2.6. Using PyPy with JIT reduced this to 0.77ms.

This shows that before turning to micro-optimization it pays to look for algorithmic optimizations. Micro-optimizations make spotting algorithmic changes much harder for relatively tiny gains.

EPS = 0.00000001
def quickCheck(n):
    partial_c = sqrt(1 + 24*n) / -6
    c = 1/6 + partial_c
    if abs(int(c) - c) < EPS:
        return True
    c = 1/6 - partial_c
    if abs(int(c) - c) < EPS:
        return True
    return False

def p(i):
    return i * (3 * i - 1 ) / 2

def main(maxNumber):
    diff = 1000

    for i in range(1, maxNumber):
        for g in range(i+1, maxNumber):
            if p(g) - p(i) >= diff:
                break 
            if quickCheck(p(g) - p(i)) and quickCheck(p(g) + p(i)):
                print('New couple ', p(g), p(i), p(g) - p(i))
                diff = p(g) - p(i)
Ants Aasma
As with all things....first optimize the problem, then work on the code. The "if p(g) - p(i) >= diff: break" algorithm change makes a HUGE difference. This change took the my code (which has the same basic changes as @truppo) from ~2.3s to 13ms....or over two orders of magnitude reduction in runtime.
semiuseless