views:

470

answers:

11

I had an argument with a colleague about writing python efficiently. He claimed that though you are programming python you still have to optimise the little bits of your software as much as possible, as if you are writing an efficient algorithm in C++.

Things like:

  • In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.
  • Use the most efficient functions for manipulating strings in common use. Not code that grinds strings, but simple things like doing joins and splits, and finding substrings.
  • Call as less functions as possible, even if it comes on the expense of readability, because of the overhead this creates.

I say, that in most cases it doesn't matter. I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. We're mostly writing tests in python.

What's your view of the matter?

+14  A: 

My answer to that would be :

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

(Quoting Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268)


If your application is doing anything like a query to the database, that one query will take more time than anything you can gain with those kind of small optimizations, anyway...

And if running after performances like that, why not code in assembly language, afterall ? Because Python is easier/faster to write and maintain ? Well, if so, you are right :-)

The most important thing is that your code is easy to maintain ; not a couple micro-seconds of CPU-time !
Well, maybe except if you have thousands of servers -- but is it your case ?

Pascal MARTIN
They do run on many servers. A dozen at most, and every test is self-sufficient. There are no shared resources except from the repository from which the test code is checked out.
Avihu Turzion
Importantly, this **applies to C++ as well.**
ilya n.
This applies to almost any programming language / application, actually (I don't code in C++ anymore, and have never programmed in Python, but all this is true for PHP, Javascript, SQL, or whatever I can thinkg about ^^ )
Pascal MARTIN
I think the only code that is expected of you not to be this way is the code that you hand in in your B.A.
Avihu Turzion
"My Answer"???You could have mentioned that this quote is from Donald Knuth..
0x89
@0x89 : true ; I copy-pasted the quoted, but forgot to do the same for the author ; sorry about that ; corrected :-)
Pascal MARTIN
+3  A: 

I should also say that context of the code is not a super-efficient NOC or missile-guidance systems. We're mostly writing tests in python.

Given this, I'd say that you should take your colleague's advice about writing efficient Python but ignore anything he says that goes against prioritizing readability and maintainability of the code, which will probably be more important than the speed at which it'll execute.

hasseg
+13  A: 

The answer is really simple :

  • Follow Python best practices, not C++ best practices.
  • Readability in Python is more important that speed.
  • If performance becomes an issue, measure, then start optimizing.
e-satis
Whether readability is more important then speed depends on your circumstances.
EFraim
Readability isn't always more important than speed, and the language you choose doesn't make that determination for you. You should follow the same rules in generally any language - write good, clean, readable code using the accepted patterns of that language, and when that's too slow, optimize it. Of course, not all premature optimization is bad, only speculative premature optimization - if you *know* you will be dealing with a dataset of a certain size or characteristic, then by all means choose an algorithm that handles that well.
Nick Bastin
I agree with both of you. EFraim, circumstances of course matters, but in this case, Python let you bind to any language that is faster that itself. So if this circumstances apply, it would be even better to choose C over Python for the part of your system with bottleneck. Python is not meant to be fast. At all. @Nick Bastin, indeed common sense goes in your direction. If you know the best way to do it and it not at a hight maintenance cost, why not doing it the faster way ? It's just that choosing Python is already choosing a coding philosophie that place human time above CPU time.
e-satis
+2  A: 

In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.

This is generally a good advice, and also depends on the logic of your program. If it makes sense that the second statement is not evaluated if the first returns false, then do so. Doing the opposite could be a bug otherwise.

Use the most efficient functions for manipulating strings in common use. Not code that grinds strings, but simple things like doing joins and splits, and finding substrings.

I don't really get this point. Of course you should use the library provided functions, because they are probably implemented in C, and a pure python implementation is most likely to be slower. In any case, no need to reinvent the wheel.

Call as less functions as possible, even if it comes on the expense of readability, because of the overhead this creates.

$ cat withcall.py
def square(a):
        return a*a

for i in xrange(1,100000):
        i_square = square(i)

$ cat withoutcall.py
for i in xrange(1,100000):
        i_square = i*i

$ time python2.3 withcall.py
real    0m5.769s
user    0m4.304s
sys     0m0.215s
$ time python2.3 withcall.py
real    0m5.884s
user    0m4.315s
sys     0m0.206s

$ time python2.3 withoutcall.py
real    0m5.806s
user    0m4.172s
sys     0m0.209s
$ time python2.3 withoutcall.py
real    0m5.613s
user    0m4.171s
sys     0m0.216s

I mean... come on... please.

Stefano Borini
Calls for an ANOVA test, two samples are a bit too few, especially with the observed variance.
bjarkef
With enough number of samples, any difference will be significant ;) The question is if it really matters.
Stefano Borini
+10  A: 

This sort of premature micro-optimisation is usually a waste of time in my experience, even in C and C++. Write readable code first. If it's running too slowly, run it through a profiler, and if necessary, fix the hot-spots.

Fundamentally, you need to think about return on investment. Is it worth the extra effort in reading and maintaining "optimised" code for the couple of microseconds it saves you? In most cases it isn't.

(Also, compilers and runtimes are getting cleverer. Some micro-optimisations may become micro-pessimisations over time.)

Simon Nickerson
+1: Profile before fooling around with "optimization".
S.Lott
+1 for "micro-pessimisations"
glenn jackman
+4  A: 

I agree with others: readable code first ("Performance is not a problem until performance is a problem.").

I only want to add that when you absolutely need to write some unreadable and/or non-intuitive code, you can generally isolate it in few specific methods, for which you can write detailed comments, and keep the rest of your code highly readable. If you do so, you'll end up having easy to maintain code, and you'll only have to go through the unreadable parts when you really need to.

giorgian
+1  A: 

Sure follow Python best-practices (and in fact I agree with the first two recommendations), but maintainability and efficiency are not opposites, they are mostly togethers (if that's a word).

Statements like "always write your IF statements a certain way for performance" are a-priori, i.e. not based on knowledge of what your program spends time on, and are therefore guesses. The first (or second, or third, whatever) rule of performance tuning is don't guess.

If after you measure, profile, or in my case do this, you actually know that you can save much time by re-ordering tests, by all means, do. My money says that's at the 1% level or less.

Mike Dunlavey
+2  A: 

I think there are several related 'urban legends' here.

  • False Putting the more often-checked condition first in a conditional and similar optimizations save enough time for a typical program that it is worthy for a typical programmer.

  • True Some, but not many, people are using such styles in Python in the incorrect belief outlined above.

  • True Many people use such style in Python when they think that it improves readability of a Python program.

About readability: I think it's indeed useful when you give the most useful conditional first, since this is what people notice first anyway. You should also use ''.join() if you mean concatenation of strings since it's the most direct way to do it (the s += x operation could mean something different).

"Call as less functions as possible" decreases readability and goes against Pythonic principle of code reuse. And so it's not a style people use in Python.

ilya n.
+2  A: 

Before introducing performance optimizations at the expense of readability, look into modules like psyco that will do some JIT-ish compiling of distinct functions, often with striking results, with no impairment of readability.

Then if you really want to embark on the optimization path, you must first learn to measure and profile. Optimization MUST BE QUANTITATIVE - do not go with your gut. The hotspot profiler will show you the functions where your program is burning up the most time.

If optimization turns up a function like this is being frequently called:

def get_order_qty(ordernumber):
    # look up order in database and return quantity

If there is any repetition of ordernumbers, then memoization would be a good optimization technique to learn, and it is easily packaged in an @memoize decorator so that there is little impact to program readability. The effect of memoizing is that values returned for a given set of input arguments are cached, so that the expensive function can be called only once, with subseqent calls resolved against the cache.

Lastly, consider lifting invariants out of loops. For large multi-dimensional structures, this can save a lot of time - in fact in this case, I would argue that this optimization improves readability, as it often serves to make clear that some expression can be computed at a high-level dimension in the nested logic.

(BTW, is this really what you meant? •In an if statement with an or always put the condition most likely to fail first, so the second will not be checked.

I should think this might be the case for "and", but an "or" will short-circuit if the first value is True, saving the evaluation of the second term of the conditional. So I would change this optimization "rule" to:

  • If testing "A and B", put A first if it is more likely to evaluate to
    False.
  • If testing "A or B", put A first if it is more likely to evaluate to True.

But often, the sequence of conditions is driven by the tests themselves:

if obj is not None and hasattr(obj,"name") and obj.name.startswith("X"):

You can't reorder these for optimization - they have to be in this order (or just let the exceptions fly and catch them later:

if obj.name.startswith("X"):
Paul McGuire
+1  A: 

My visceral reaction is this:

I've worked with guys like your colleague and in general I wouldn't take advice from them.

Ask him if he's ever even used a profiler.

+1  A: 

Is this question even serious? It reads like a troll attempt. The propositions you described aren't best practices in c++, never-mind python.

Alan
I have to say here that I am not of the opinion as my colleagues that these are best practices in the world most of us live in. These are best practices that he carries with him from the time his done his B.A. several years ago.
Avihu Turzion
Who gets to say what "best practices" are anyway? I don't mind if programmers share their wisdom. I do mind if a professor pulls his or her opinion out of the air, tells students it's a "best practice", and they go out preaching it to the world.
Mike Dunlavey