views:

620

answers:

1

I've been reading up on the Sphinx search engine and the Thinking Sphinx gem. In the TS docs it says...

Sphinx has one major limitation when compared to a lot of other search services: you cannot update the fields [of] a single document in an index, but have to re-process all the data for that index.

If I understand correctly, that means when a user adds or edits something, the change is not reflected in the index. So if they add a record it won't come up in searches until the entire index is rebuilt. Or if they delete a record, it will come up in searches, and then cause some kind of error or frustrating behavior.

Moreover, while rebuilding the index Sphinx is shut down. So, your app's search functionality goes off line regularly (once an hour, once every few hours), and anyone who tries to do a search then will get an error or a "try later" message.

OK, clearly none of that is acceptable in real-world app. So you pretty much have to use delta indexing.

But apparently you still need to regularly shut down your search engine and do a full indexing...

Turning on delta indexing does not remove the need for regularly running a full re-index, as otherwise the delta index itself will grow to become just as large as the core indexes, and this removes the advantage of keeping it separate. It also slows down your requests to your server that make changes to the model records.

I don't really understand what the docs are saying here. Maybe someone can help me out. I thought the whole point of delta indexing was that you don't need to regularly rebuild the index. It's updated instantly whenever the data changes.

Because rebuilding the index every hour or every anything would be totally messed up, right?

+6  A: 

If I understand correctly, that means when a user adds or edits something, the change is not reflected in the index. So if they add a record it won't come up in searches until the entire index is rebuilt. Or if they delete a record, it will come up in searches, and then cause some kind of error or frustrating behavior. Moreover, while rebuilding the index Sphinx is shut down. ...

You don't need to rebuild your indexes - just reindex them. Which means - there's no need to stop the daemon. Rebuilding is only needed after changing the structure of the index - and that is not the case here.

And for the second part - again, you don't rebuild the index, ergo stopping the deamon isn't necessary. When using delta indexing there are actually two indexes that are used for searching - the main index (which should be reindexed once a while) and the delta index (which is refreshed after each relevant operation on the record). If I understand it correctly, when reindexing the main index (eg. via cron task), the delta index is simply merged into the main index, so it won't take that much place and stay fast.

Milan Novota
When re-indexing the main index a full index is performed (ie. the delta isn't merged in any way). Besides that, your comment is spot on.
James Healy
Yes, it's not physically merged, that's a bad wording. Thanks for pointing out.
Milan Novota
Also, it's worth noting - deletions are tracked (as much as possible) in Thinking Sphinx without needing delta indexes.
pat