views:

127

answers:

1

So I have this cron script that is deployed and ran using Cron on a host and indexes all the records in a database table - the index is later used both for the front end of the site and the backed operations as well.

After the operation, the index is about 3-4 MB.

The problem is it takes a lot of resources (CPU: 30+ and a good chunk of memory) and slows the machine down. My question is about how to optimize the operation described below:

First there is a select query built using the Zend Framework API, this query is then passed to a Paginator factory that returns a paginator which I am using to balance the current number of items being indexed and not iterate over too much items. The script is iterating over the current items in the paginator object using a foreach loop until reaching the end and then it starts from the beginning after getting items for the next page.

I am suspecting this overhead is caused by the Zend_Lucene but no idea how this could be improved.

+1  A: 

See my answer to Can I predict how large my Zend Framework index will be?

I tested Zend_Search_Lucene versus Apache Lucene (the Java version). In my test, the Java product indexed 1.5 million documents about 300x faster than the PHP product.

You'll be much happier using Apache Solr (the Tomcat container for Apache Lucene). Solr includes a tool called DataImportHandler that sucks data directly from a JDBC data source.

Use the PECL Solr extension to communicate with Solr from PHP. If you can't install that PHP extension, use Curl which should be available in default installations of PHP.

Bill Karwin
Thanks for the help.It looks like I am going to use Zend's Lucene only for the searching part and do the indexing in Java.
smok
You should be aware that Zend Lucene can't understand Apache Lucene index formats higher than version 2.3.
Bill Karwin
Thanks, I also found the presentation Practical full-text search in MySQL particularly useful.
smok