If I was to implement SOLR, how is the
standard way of doing it?
The standard way to use Solr is to configure it as a separate web application, running in an instance of your favorite web application container (Jetty is recommended, and there may be some incompatibility issues with Resin). You communicate with it via its HTTP interface, either programmatically using a library like SolrJ or the provided JAR files, or directly using curl
or a web browser.
To get started, I recommend reading the tutorial, which is for Solr 1.3 but is for the most part still relevant.
And, I have not understood if I should
update the SOLR index everytime a new
classified is posted/or
updated/changed, or index them all at
once every 12 hours or so?
You can add documents as soon as they are posted. Solr will first write them into memory and then, depending on your configuration settings, after a certain amount of time has passed or after a certain amount of documents are pending, it will "commit" them to the index on disk. You can also configure how often the index is optimized, which is an expensive operation that compacts the on-disk index.
Should I use SOLR to find the ID of
the classifieds and then fetch the
record in mysql, or should I use ONLY
SOLR?
If your data is small (by total size, not individually) you can put it all into Solr, but as it grows larger you may want to use a hybrid solution where Solr just holds the indexed values and MySQL is used for the stored data.
But haven't found any good articles
about dataImportHandlers if that's
what I need..
If you want to export your MySQL data into Solr, use the CSVRequestHandler (note that csv here can really be any flat file format, like the one MySQL would produce.
Edit: in response to your comment, I hadn't looked into DataImportHandler
s before, but reading the page in the Solr manual, it seems that this is useful for importing data directly from your database (either all at once or periodic deltas) when you have a whole schema that you need to preserve. So I would say that if the data you need to index is a few fields in a single table, use the CSVRequestHandler
because it is very easy and does not need to be configured, but if you have a bunch of tables with relationships between them and all of the data needs to be put into the index, then you should look into DataImportHandler
.
Edit 2: anything that can make an HTTP request can update Solr, but here is a link to a project on google code that provides a PHP implementation of a Solr client. I have not used it myself.