Manage Zend_Search_Lucene Index on a load balanced environment

views:

122

answers:

Manage Zend_Search_Lucene Index on a load balanced environment

Each server in the cluster has a a search index that is synced from one of the servers every 15 minutes. This was done because appending to an index can't happen on a nfs because of flock; see documentation (or else the index would be on a shared folder that all servers access).

The issue that I'm running into is that if an action is taken that requires modifying of the index, the modifications happens on the local copy of the index and I need a way to sync those changes back to the parent in the least intrusive way as possible (so that the changes propagate to all servers in the cluster by the next sync).

I tried referencing the parent server index via http but this won't work because mkdir can't be done over http. Is there a way to reference the index of a remote server? If there is an entirely different approach available that will be considered as well

As I understand this situation, if one of the server's indexes undergoes modification you want the main index as the source of the rsynch to receive the update before the next rsync takes place - to update all servers with the update.

Instead of rsync-ing the main server index, why not rsync to take the latest modified date as the source for the rsync? So if the latest update to index on server D is greater than the index on main server A, simply sync all the servers on the source of D

Did I understand your situation correctly?

Edit

Then in that case, edit the code that builds the index-es and add a line that checks to see if the previous build of the index was different, if so launch a exec call to a shell script or build the command manually to update the central sever. This way the central server will receive updates on the fly and when the big sync goes down you will have your problem solved.

Matt1776 2009-07-07 06:23:23

The issue with this solution would be that server B and C may also have had updates since as well and syncing one will overwrite the other.

Akeem 2009-07-07 12:43:30

Look at my solution and 'Jason's' solution below .. they are very similar in architecture and seems like your best bet

Matt1776 2009-07-08 19:08:22

The best solution I can think of is to follow a more traditional Master / Slave replication pattern. Take some inspiration from RDBMS replication: All writes should go to the master.

Of course, you can't do this directly. As you've mentioned, you can't write directly to the remote index.

So, this leaves you with one option: Expose an API / Service on your master server that the slaves can use to indirectly update the index. Then, all changes will be synced back at your next scheduled push. I do realize this may be a significant change to your design, but in a replicated or distributed environment, this is often necessary.

jason 2009-07-08 18:11:41

ansaurus

tags:

views:

answers:

Manage Zend_Search_Lucene Index on a load balanced environment

related questions