views:

27

answers:

1

I'm using the PHPCrawl class to spider websites and build a list of links. It all works well, if slowly, and I then use the links to perform other tasks.

I'm encountering a problem where the first time I run the script it completes with no result, then the next time I run it it works as expected. It's failing about 30% of the time.

I thought at first that this was a network or workstation issue, but the same problem occurs on a different machine in a different location using a different ISP.

Has anybody else used this class and encountered the same problem?

A: 

After extensive testing I've found that it seems to be related to the streamTimeout setting.

The problem here is that setting it too high results in a very slow crawl. Tinkering with the connectionTimeout seems to mediate this a little.

Leo