Alternatives to my slow method of using BeautifulSoup and Python to parse Amazon API XML?

views:

answers:

Alternatives to my slow method of using BeautifulSoup and Python to parse Amazon API XML?

As the title says, I'm using the BS module in Python to parse XML pages that I access from the Amazon API (i create the signed url, load it with liburl2, and then parse with BS).

It takes about 4 seconds to do two pages, but there has to be a faster way

Would PHP be faster? What's making it slow, the BS parsing or the liburl loading?

+2 A:

If you want to find out what's making it slow, use one of the profilers. I suspect it's the network access (and their underlying database retrieval) that's slower than the rest.

Jason R. Coombs 2010-07-06 11:28:45

Thanks, Jason, that's pretty handy!I'm guessing "Total Time" is what I'm looking for and I noticed this: 1488 13.265 0.009 13.265 0.009 {method 'recv' of '_socket.socket' objects}At 13.265seconds I'd bet this is the culprit. May I ask what it means though?

Mike J 2010-07-06 11:34:20

@Mike, recv = receive. Usually a blocking call to the socket. Meaning that it's simply waiting for either amazon to respond or for all the data to arrive (depends on how much data and how much bandwidth is available).

wds 2010-07-06 11:42:18

@wds thanks! So I guess I don't really have a choice about the speed?

Mike J 2010-07-06 11:46:34

@Mike: not really, cache agressively

wds 2010-07-06 11:59:02

@Mike You could retrieve multiple pages in parallel.

Fabian 2010-07-06 11:59:51

@wds and @ Fabina,Thanks! Would you know of a good place to help me get started in learning about caching and parallel fetching?

Mike J 2010-07-06 15:00:16

@Mike caching -> depends how many files and how many times you need them. If it's not too much, consider just holding them in a dictionary. For a more robust solution you might consider memcached. As for parallel fetching, I'd probably go with multiprocessing to handle the different fetchers and their I/O http://docs.python.org/library/multiprocessing.html

wds 2010-07-08 08:54:45

ansaurus

tags:

views:

answers:

Alternatives to my slow method of using BeautifulSoup and Python to parse Amazon API XML?

related questions