views:

434

answers:

6

Hi. I'm fetching lots of user data via last.fm's API for my mashup. I do this every week as I have to collect listening data.

I fetch the data through their REST API and XML: more specifically simplexml_load_file().

The script is taking ridiculously long. For about 2 300 users, the script takes 30min to fetch only the names of artists. I have to fix it now, otherwise my hosting company will shut me down. I've siphoned out all other options, it is the XML that is slowing the script.

I now have to figure out whether last.fm has a slow API (or is limiting calls without them telling us), or whether PHP's simplexml is actually rather slow.

One thing I realised is that the XML request fetches a lot more than I need, but I can't limit it through the API (ie give me info on only 3 bands, not 70). But "big" XML files only get to about 20kb. Could it be that, that is slowing down the script? Having to load 20kb into an object for each of the 2300 users?

Doesn't make sense that it can be that... I just need confirmation that it is probably last.fm's slow API. Or is it?

Any other help you can provide?

A: 

I don't think simple xml is that slow, it's slow because it is a parser but I think the 2300 curl/file_get_contents are taking a lot more time. Also why don't fetch the data and just use simplexml_load_string, do you really need to put those file on the disk of the server ?

At least loading from memory should speed up a bit things, also what kind of processing are you going on the loaded xmls ? are you sure you processing is efficient as it could be ?

RageZ
It isn't the processing, I've tested that. It is the act of using "simplexml_load_file"... So you are saying I should get the xml from last.fm and then load it locally?
Shotbeak
Loading from memory would at least spare few io
RageZ
A: 

20kb * 2300 users is ~45MB. If you're downloading at ~25kB/sec, it will take 30 minutes just to download the data, let alone parse it.

Lamah
Whoa, it could be this... but the data is download from server to server. It can't be THAT slow.
Shotbeak
A: 

Make sure the XML that you download from last.fm is gzipped. You'd probably have to include the correct HTTP header to tell the server you support gzip. It would speed up the download but eat more server resources with the ungzipping part.

Also consider using asynchronous downloads to free server resources. It won't necessarily speed the process up, but it should make the server administrators happy.

If the XML itself is big, use a SAX parser, instead of a DOM parser.

Francois Botha
A: 

I think there's a limit of 1 API call per second. I'm not sure this policy is being enforced through code, but it might have something to do with it. You can ask the Last.fm staff on IRC at irc.last.fm #audioscrobbler if you believe this to be the case.

Aistina
A: 

What you really ought to do is to profile your app. Profiling will tell you which part of your code is taking the most time to execute, soaking up all the memory and so on.

PHP's xdebug has a profiler which you could use for this.

e4c5
A: 

I'm kinda having a similar problem.. Trying to fetch song names and playcounts from any users 50 top artists... after some time I Found out that it's very slow. For example - you want only song names and playcounts of an artist, and if it has 50 songs - it will send you all other metadata, not only song names and playcounts. One XML with 50 song metadata is about 40KiB... so yeah it's slow =(

pootzko