Is there a pre-written PHP spider/crawler that can be used to feed documents to the Zend_Search_Lucene indexer? I've found Sphider but it is very tightly coupled to MySQL, and not able to be integrated easily with Zend Lucene (as far as I can tell)
I'd originally written the search index to work on CMS/Wordpress page-save, so no spidering was needed, but now we need to integrate an external site too.