views:

538

answers:

2

I just setup django-sphinx, and it is working beautifully. I am now able to search my model and get amazing results. The one problem is that I have to build the index by hand using the indexer command. That means every time I add new content, I have to manually hit the command line to rebuild the search index. That is just not acceptable.

I could make a cron job that automatically runs the indexer command every so often, but that's far from optimal. New data won't be indexed until the cron runs again. In addition, the indexer will run unnecessarily most times as my site doesn't have data being added very often.

How do I set it up so that the Sphinx index will automatically rebuild itself whenever data is added to or modified in a searchable django model?

+2  A: 

There are basically two primary strategies for building search indexes:

  1. Indexer internal to a database server, which indexes on the fly as records are inserted or deleted.
  2. Indexer external to the database (which may or may not be a RDMS which is why I leave off the word server), which indexes periodically.

The first strategy has the obvious advantage of being closer to real-time but possibly a huge disadvantage in performance. Most database servers with internal indexers have performance problems (or else missing features), see for example Jeff Atwood discussing performance problems in SQL Server 2008 in his blog post about adding a second server for stackoverflow.

The second strategy isn't as real-time but generally has best performance, Unfortunately this also means, because it isn't built-in, it has to be invoked externally somehow.

Obviously you have no choice with Sphinx, it being an external indexer. You must invoke the sphinx indexer from cron or some other scheduling mechanism.

To speed up indexing just run it often from cron. If that causes performance issues then you need to implement a live-update strategy which involves indexing new records very frequently into a delta index and then periodically merging the delta index into the primary index. This would be done external to Django so it doesn't affect anything in django-sphinx.

Van Gale
A: 

The above sounds right to me, though I'll mention that you could use a view to call the indexer, if you so desired.

It'd probably get called a LOT, but it could work. Just call it as you would any external command.

mlissner