tags:

views:

63

answers:

1

Hey Guys,

I want to migrate my system from Active Python 2.4 to Python 2.6.5. However I face some problem in parsing XML files. The I/O is very slow.

My sample xml file

<config><dicts><dictName>EnvDict</dictName><dictElems><key>AppServerIP</key>  <value>localhost</value><key>DBServerIP</key>   <value>localhost</value><key>DBServerName</key> <value>DB1</value></dictElems></dicts></config>

My log shows this xml parsing took 25s.

My system is structure as below

Publisher-Subr is used to redirect request to different modules

ClntMgrFact is attached to PubSubr and listen to pre-defined ports. It will spawn a new process for login from client.

ClntMgr(process) is spawned by ClntMgrFact and also attached to PubSubr. ClntMgr will generate a ClntWorker(thread) to process workflow.

ClntWorker need to read some static XML file from local. But the parsing is extremely slow. My XML file is around 500 - 700k.

Any one can help on this without changing the system structure? Thanks in advance.

A: 

I'm very perplexed...:

$ py26 -mtimeit -s'import rex' 'rex.t()'
10000 loops, best of 3: 103 usec per loop

100 microseconds seems more reasonable than 25 seconds to read in, and parse, such a small XML file as you're giving (even on the old laptop I'm using for the timing!) -- but how to explain the fact that my parsing is being 250,000 times faster than yours?!

This is rex.py, btw...:

import xml.etree.cElementTree as et

def t(fn='static.xml'):
  return et.parse(fn)

and static.xml is the file where I wrote your XML example (223 characters).

So what's your platform, OS, Python version, chosen XML parser, etc...? I'm on a macbook pro laptop, OSX 10.5.8, 2.4 GHz Intel Core Duo, 667 MHz DDR2 RAM -- as I said, a pretty old machine indeed! -- with Python 2.6.4 straight from python.org.

Alex Martelli
Hi Alex,I don't think is the hardware. In fact I am able to parse XML in my ClntMgr, the new process spawn by ClntMgrFact, but not in the worker.The reason I dont want to change my server structure is that it works fine on Active Python 2.4.My platform: Windows XP, Py2.6.5, SAX parser, CPU 2.2G, 2.19G, RAM 2G
Winston999
Using SAX gives me 250 microseconds -- a stunning 2.5 times slowdown, but still 5 orders of magnitude faster than you're seeing. You must be doing something **exceedingly** slow in the handler or thereabouts. Without changing your server _structure_ at all, why not simply switch to `cElementTree` for the specific task of XML parsing, and enjoy its simplicity and speed (or at least see what happens then? Maybe that worker thread is getting starved by some sort of bad interplay of locks, waits and delays).
Alex Martelli
cElementTree helps, it is 5 times faster when i read the sample XML. But my system need to read files of 700k. I think is more likely to be the process problem. I tested my server with Py2.4.4. It works fine.
Winston999
@Winston, as I said I just can't reproduce your problem with 2.6.4 -- I have no hypothesis to explain why your code (that I have never seen) slows down so much on your test files (that I have never seen either), and, being completely unable to reproduce the slowdown you observe, I therefore have no help to offer.
Alex Martelli
Thanks Alex, my problem solved. It turns out not the XML that causes the slowness. It is the TCP/IP connection between my pubsubr and the newly spawn process. I just add a "time.sleep(0.01)" after each pulling. Thanks for leading me to a right direction
Winston999