views:

313

answers:

1

I just tried to run BeautifulSoup (3.1.0.1) with Jython (2.5.1) and I was amazed to see how much slower it was than CPython. Parsing a page (http://www.fixprotocol.org/specifications/fields/5000-5999) with CPython took just under a second (0.844 second to be exact). With Jython it took 564 seconds - almost 700 times as much.

Can anyone confirm this result? It's doesn't seem reasonable for Jython to run 700 times slower than CPython. Perhaps something is wrong with my setup.

[Edit] Here's the code I used to test this (naturally I downloaded the above mentioned HTML file):

import time
from BeautifulSoup import BeautifulSoup
data = open("fix-5000-5999.html").read()
start = time.time()
soup = BeautifulSoup(data)
print time.time() - start
+5  A: 

I can confirm similar findings.

Intel Mac, OS X 10.6.1, Java 1.6.0_15 64-bit, Jython 2.5.1.

Running your code with CPython 2.6.1 takes 0.1–0.2 seconds, but running it with Jython takes at least tens of seconds; I didn't wait more than 30. It also uses a lot of CPU.

I tried Beautiful Soup 3.0.7a, because it uses a different parser, but had the same results.

Interestingly, I tried running your code on a different HTML file and it worked fine. But it still seemed much slower than CPython: Jython took 1.02–1.3 seconds; CPython took 0.019–0.020.

I don't have any suggestions at this point except that you should consider asking this question on the jython-users list; I've found the community there, which includes the lead developer, to be responsive and helpful.

Good luck!

Avi Flax
I would assume the difference in is size, the HTML I used is about 300K and the second one you used is just 7K. Thanks for verifying.
gooli