views:

920

answers:

9

I've been a Perl guy for over 10 years but a friend convinced me to try Python and told me how much faster it is than Perl. So just for kicks I ported an app I wrote in Perl to Python and found that it runs about 3x slower. Initially my friend told me that I must have done it wrong, so I rewrote and refactored until I could rewrite and refactor no more and ... it's still a lot slower. So I did a simple test:

i = 0
j = 0

while (i < 100000000):
    i = i + 1
    j = j + 1

print j

$ time python python.py
100000000

real 0m48.100s
user 0m45.633s
sys 0m0.043s

my $i = 0;
my $j = 0;

while ($i < 100000000) {
    ++$i; # also tested $i = $i + 1 to be fair, same result
    ++$j;
}

print $j;

$ time perl perl.pl
100000000

real 0m24.757s
user 0m22.341s
sys 0m0.029s

Just under twice as slow, which doesn't seem to reflect any of the benchmarks I've seen ... is this a problem with my installation or is Python really that much slower than Perl?

+4  A: 

Python is not particularly fast at numeric computations and I'm sure it's slower than perl when it comes to text processing.

Since you're an experienced Perl hand, I don't know if this applies to you but Python programs in the long run tend to be more maintainable and are quicker to develop. The speed is 'enough' for most situations and you have the flexibility to drop down into C when you really need a performance boost.

Update

Okay. I just created a large file (1GB) with random data in it (mostly ascii) and broke it into lines of equal lengths. This was supposed to simulate a log file.

I then ran simple perl and python programs that search the file line by line for an existing pattern.

With Python 2.6.2, the results were

real    0m18.364s
user    0m9.209s
sys 0m0.956s

and with Perl 5.10.0

real    0m17.639s
user    0m5.692s
sys 0m0.844s

The programs are as follows (please let me know if I'm doing something stupid)

import re
regexp = re.compile("p06c")

def search():
    with open("/home/arif/f") as f:
        for i in f:
            if regexp.search(i):
                print "Found : %s"%i

search()

and

sub search() {
  open FOO,"/home/arif/f" or die $!;
  while (<FOO>) {
    print "Found : $_\n" if /p06c/o;
  }
}

search();

The results are pretty close and tweaking it this way or other don't seem to alter the results much. I don't know if this is a true benchmark but I think it'd be the way I'd search log files in the two languages so I stand corrected about the relative performances.

Thanks Chris.

Noufal Ibrahim
i disagree with the "it's slower than perl when it comes to text processing" statement. Before you say that, you should provide your own benchmark testing for justification
ghostdog74
"Python programs in the long run tend to be more maintainable and are quicker to develop." [citation needed] (Python doesn't even allow you to predeclare variables. One typo during an assignment, and you have a very hard-to-find bug.)
jrockway
I don't have benchmarks. Perl was originally written for text processing and a lot of the design decisions and optimisations were made specifically for that. I expect it to be faster and I think that's the case. As for maintainability, the "one way and preferably only one way" methodology tends to have a uniforming effect on code which helps readability much more than TMTOWTDI.
Noufal Ibrahim
"I expect it to be faster and I think that's the case." Well, never mind providing evidence, then! Expectations and hunches are almost never proven wrong.
Chris B.
You're right. I'll do some quick benchmarks with arbitrary text and post the results. Let's see if my hunch is valid. My comment on the readability still stands though.
Noufal Ibrahim
If it helps any - the language shootout usually shows python taking twice the average time of perl on regex tests. http://shootout.alioth.debian.org/u32q/python.php
mozillalives
regex is not needed with your Python example. Use the "in" operator.
ghostdog74
Python is only more readable if you're Perl-illiterate. ;)
fennec
@ghostdog74. I wanted to use regexps in both places. Assume a pattern rather than a fixed string.
Noufal Ibrahim
@mozillalives Did you notice that the Perl program used quad-core and the Python program didn't? Look at regex here - http://shootout.alioth.debian.org/u32/python.php#faster-programs-measurements
igouy
Good grief, what a clamor in the comments. You Perl fans, okay, you love Perl, we get it. I love Python, and I do not like Perl, but I am not going to snipe at favorable comments on Perl. If Perl is what you like, use that, and especially use Perl for things that it is really good at. IMHO, writing large complex systems that need to be maintained is *not* one of the things Perl is good at, but grepping a large file *is*.
steveha
on many platforms, older perl's would have done better; starting in 5.10, usefaststdio (formerly the default on many platforms - which causes perl to try to interact in encapsulation-breaking ways with libc's FILE structs in the interest of blinding speed) defaults to off and the slower perlio abstraction layer replaces use of stdio.
ysth
@steveha: I'm not seeing any comments that I'd label as sniping, except at oversimplified "benchmarks", regardless of language.
ysth
+6  A: 

Python runs very fast, if you use the correct syntax of the python language. It is roughly described as "pythonic".

If you restructure your code like this, it will run at least twice as fast (well, it does on my machine):

j = 0
for i in range(10000000):
    j = j + 1
print j

Whenever you use a while in python, you should check if you could also use a "for X in range()".

FlorianH
The speed of that will heavily depend on whether you are on Python 2 or Python 3.
Mark Byers
Wow, that runs over 10 times faster than the OP's code on my machine!
quamrana
You are right: If you are on Python 2, you should use xrange, I guess. A quick run on my laptop took 27.7 seconds with "range" and 21.4 seconds with "xrange". That is no proper benchmark though.
FlorianH
Replacing `range` with `xrange` should help a little more and I think `j+=1` will boost it a little more.
Noufal Ibrahim
And do you consider this."using the 'right' syntax" a normal behavior for a 'programming' language ?
Andrei Ciobanu
@Nomemory: Are you saying that in programming languages you should able to use wrong syntax and still get valid or optimal results? Sounds like magic. And in any language (that I know) using a while loop in a situation where a for loop is called for is considered wrong. Or at least awkward.
Tofystedeth
Typically, using a screwdriver for screws and a hammer for nails is a normal behavior for a tool user.
ΤΖΩΤΖΙΟΥ
+2  A: 

python is slower then perl. It may be faster to develop but it doesnt execute faster here is one benchmark http://xodian.net/serendipity/index.php?/archives/27-Benchmark-PHP-vs.-Python-vs.-Perl-vs.-Ruby.html -edit- a terrible benchmark but it is at least a real benchmark with numbers and not some guess. To bad theres no source or test other then a loop.

acidzombie24
don't believe such benchmarks.
ghostdog74
You can't very well demand benchmarks in a different comment, then dismiss them out of hand in this one. Perl is faster than Python for many tasks. Python's faster than Perl in many tasks as well. What's important is that they're both generally within the ballpark of one another, and close enough that larger decisions about algorithms and architecture will determine which program runs faster.
Chris B.
The time taken to print "hello world" really isn't interesting!
igouy
i know it isnt interesting but at least the loop was real and an example. I just said one benchmark not a good benchmark to look at. It shows at least one time and memory comparison. I never expected anyone to think python is faster then perl.
acidzombie24
"at least the loop was real" - it isn't anymore "real" than the code snippet Foobarbaz posted in the original question.
igouy
+6  A: 

To OP, in Python this piece of code:

j = 0
for i in range(10000000):
    j = j + 1
print j

is the same as

print range(10000001)[-1]

which, on my machine,

$ time python test.py
10000000

real    0m1.138s
user    0m0.761s
sys     0m0.357s

runs for approximately 1s. range() (or xrange) is internal to Python and "internally" , it already can generates a sequence of numbers for you. Therefore, you don't have to create your own iterations using your own loop. Now, you go and find a Perl equivalent that can run for 1s to produce the same result

ghostdog74
Mine was 100000000, with 8 zeroes, not 7.
Foobarbaz
You are reducing the already toy example ad absurdum. Down this road, the perl equivalent would be "print 100000000;".
Beni Cherniavsky-Paskin
@Foobarbaz, what i am illustrating to you is not how many zeroes make it slow or fast, but how to think of solving the problem to get the same result.
ghostdog74
@Beni, I am sure you know using print 100000000 would be meaningless for this toy example. My example shows how the "looping" can be done in Python for the same result "without using loop", at least not explicitly
ghostdog74
@Beni Cherniavsky-Paskin: Since `print 10000000` is the net effect of the perl example, that's a pretty solid indictment of this benchmark. Perhaps something from Project Euler would have been a better choice.
S.Lott
Find a perl equivalent? Something like `foreach $i (1..10000000) { ... }`? Or `print [1..10000000]->[-1]`? Yeah, Perl's got number ranges too.
fennec
+41  A: 

The nit-picking answer is that you should compare it to idiomatic Python:

  • The original code takes 34 seconds on my machine.
  • A for loop (FlorianH's answer) with += and xrange() takes 21.
  • Putting the whole thing in a function reduces it to 9 seconds!
    That's much faster than Perl (15 seconds on my machine)!
    Explanation: Python local vars are much faster than globals.
    (For fairness, I also tried a function in Perl - no change)
  • Getting rid of the j variable reduced it to 8 seconds:

    print sum(1 for i in xrange(100000000))

Python has the strange property that higher-level shorter code tends to be fastest :-)

But the real answer is that your "micro-benchmark" is meaningless. The real question of language speed is: what's the performance of an average real application? To know that, you should take into account:

  • Typical mix of operations in complex code. Your code doesn't contain any data structures, function calls, or OOP operations.

  • Optimization opportunities: after you write your code, IF it's not fast enough, how much faster can you easily make it?

    E.g. how hard is it to offload the heavy lifting to effecient C libriries?

If you want to talk number crunching, Python IS surprisingly popular with scientists. They love it for the simple pseudo-math syntax and short learning curve, but also for the excellent numpy library for array crunching and the ease of wrapping other existing C code.

And then there is the Psyco JIT which would probably run your toy example well under 1 second, but I can't check it now because it only works on 32-bit x86.

Beni Cherniavsky-Paskin
+1: that your "micro-benchmark" is meaningless -- and reveals more about the author of the benchmark than about the languages being benchmarked.
S.Lott
+1: Absolutely agree on the micro-benchmark. Compare execution times must be done on equivalent programs, which is much difficult that it sounds.
Khelben
+1 for mentioning Psyco - this is something the OP really should try for those benchmarks, here is the missing link: http://psyco.sourceforge.net/
Doc Brown
"Python local vars are much faster than globals." Interesting. Do you have a good reference to explain this? How about class instance variables—are they faster than globals too?
Craig McQueen
@Craig: I've put some more reference about that in my answer.
Roberto Liffredo
I count it as a shortcoming in python that its bizarre way of bringing variables into existence doesn't allow you a "local variable" without a function :)
hobbs
@hobbs: Aside from perfmance, why do you want local vars at top level? Namespace => del statement; Object lifetime (RAII) => not coupled to scope in Python, to assure timely release use explicit .close() etc.
Beni Cherniavsky-Paskin
+6  A: 

All this micro benchmarking can get a bit silly!

For eg. just switching to for in both Python & Perl provides an hefty speed bump. The original Perl example would be twice as quick if for was used:

my $j = 0;

for my $i (1..100000000) {
    ++$j;
}

print $j;


And I can shave off a bit more with this:

++$j for 1..100000000;
print $j;


And getting even sillier we can get it down to 1 second here ;-)

print {STDOUT} (1..10000000)[-1];

/I3az/

ref: Perl 5.10.1 used.

draegtun
i am interested in the last Perl version. What i get was around 5 or 6 secs.
ghostdog74
@ghostdog74: My first two examples take 5-6 secs here so if you're seeing that with the last example then Perl and/or your machine is a lot slower than mine!
draegtun
@ghostdog74: BTW I tried your two examples here. Last one is the quickest of all benchmarks (if xrange is used). However first example filled up memory and came in a whopping 16 times slower than Perl equivalent! Using xrange was much better, didn't fill up memory and came in just under 3 times slower than Perl version. Thing to learn... never trust pre-installed languages! (Python that comes with Mac OSX Tiger as always blown chunks!!)
draegtun
The goal isn't to make the fastest program. It's to compare similar programs doing similar things. Of course the programs are faster if you do less work. That doesn't help you compare the parts you cut out though. :)
brian d foy
Maybe, but it's kind of meaningless to speak of "similar programs doing similar things" on a microbenchmark. (Hey look my abacus is 1000 times faster than Perl at adding one to a number: I just flick a bead with my finger, and you have to type "++$j;", save it to a file, run the file, etc.)
Ken
A: 

is Python really that much slower than Perl?

Look at the Computer Language Benchmarks Game - "Compare the performance of ≈30 programming languages using ≈12 flawed benchmarks and ≈1100 programs".

They are only tiny benchmark programs but they still do a lot more than the code snippet you have timed -

http://shootout.alioth.debian.org/u32/python.php

igouy
+1  A: 

Python maintains global variables in a dictionary. Therefore, each time there is an assignment, the interpreter performs a lookup on the module dictionary, that is somewhat expensive, and this is the reason why you found your example so slower.

In order to improve performance, you should use local allocation, like creating a function. Python interpreter stores local variables in an array, with a much faster access.
However, it should be noted that this is an implementation detail of CPython; I suspect IronPython, for instance, would lead to a completely different result.

Finally, for more information on this topic, I suggest you an interesting essay from GvR, about optimization in Python: Python Patterns - An Optimization Anecdote.

Roberto Liffredo
I'd be curious to see the same benchmark run with Perl using package variables instead.
brian d foy
+1  A: 

I'm not up-to-date on everything with Python, but my first idea about this benchmark was the difference between Perl and Python numbers. In Perl, we have numbers. They aren't objects, and their precision is limited to the sizes imposed by the architecture. In Python, we have objects with arbitrary precision. For small numbers (those that fit in 32-bit), I'd expect Perl to be faster. If we go over the integer size of the architecture, the Perl script won't even work without some modification.

I see similar results for the original benchmark on my MacBook Air (32-bit) using a Perl 5.10.1 that I compiled myself and the Python 2.5.1 that came with Leopard:

However, I added arbitrary precision to the Perl program with the bignum

 use bignum;

Now I wonder if the Perl version is ever going to finish. :) I'll post some results when it finishes, but it looks like it's going to be an order of magnitude difference.

Some of you may have seen my question about What are five things you hate about your favorite language?. Perl's default numbers is one of the things that I hate. I should never have to think about it and it shouldn't be slow. In Perl, I lose on both. Note, however, that if I needed numeric processing in Perl, I could use PDL.

brian d foy