Hi,
we are designing the search architecture for a corporate web application. We'll be using Lucene.net for this. The indexes will not be big (about 100,000 documents), but the search service must be always up and always be up to date. There will be new documents added to the index all the time and concurrent searches. Since we must have high availability for the search system, we have 2 application servers which expose a WCF service to perform searches and indexing (a copy of the service is running in each server). The server then uses lucene.net API to access the indexes.
The problem is, what would be the best solution to keep the indexes synced all the time? We have considered several options:
Using one server for indexing and having the 2nd server access the indexes via SMB: no can do because we have a single point of failure situation;
Indexing to both servers, essentially writing every index twice: probably lousy performance, and possibility of desync if eg. server 1 indexes OK and server 2 runs out of disk space or whatever;
Using SOLR or KATTA to wrap access to the indexes: nope, we cannot have tomcat or similar running on the servers, we only have IIS.
Storing the index in database: I found this can be done with the java version of Lucene (JdbcDirectory module), but I couldn't find anything similar for Lucene.net. Even if it meant a small performance hit, we'd go for this option because it'd cleanly solve the concurrency and syncing problem with mininum development.
Using Lucene.net DistributedSearch contrib module: I couldn't file a single link with documentation about this. I don't even know by looking at the code what this code does, but it seems to me that it actually splits the index across multiple machines, which is not what we want.
rsync and friends, copying the indexes back and forth between the 2 servers: this feels hackish and error-prone to us, and, if the indexes grow big, might take a while, and during this period we would be returning either corrupt or inconsistent data to clients, so we'd have to develop some ad hoc locking policy, which we don't want to.
I understand this is a complex problem, but I'm sure lots of people have faced it before. Any help is welcome!