views:

183

answers:

2

Hi all, I'm using Lucene .NET

I've got 2 threads, each one doing indexing of some different content (using a different algorithm, although they might try to index the same document). They are both writing to the same index (using a single IndexWriter instance).

Also, I've got a web application that also needs to write to the index occasionally. (it obviously cannot use that same indexwriter instance)

My problem is , that the web application cannot write to the index while the 2 threads are running their indexing operation, and they always are!!

How do I manage this more efficiently?

Thanks

+2  A: 

I'm not very familiar with how Lucene.NET supports threading, but based on your description, you may want to create a "work queue" that other threads post work to - and use a single thread to pick up the work from the queue and use an IndexWriter to add it to the index. This way no single thread is ever starved from the opportunity to get its changes added to the index.

I suspect that Lucene has to use internal locks on its full text indexes anyways, so having more than one thread writing to the index is probably not an effective way to scale your code.

Finally, having multiple threads writing to a single mutable object is often a way to introduce subtle and difficult to fix concurrency problems into a codebase. I generally try to avoid having multiple writer - multiple readers, on the other hand can be quite useful.

LBushkin
Multiple threads can write to the same index using the same IndexWriter instance, this is even documented in the code itself.My question was how to cope with having seperate IndexWriter instances write to the same directory.
Roey
Could you provide more detail in your question about how you are using IndexWriter? It is unclear in your question exactly where the threading contention is.
LBushkin
There is no threading contention.I am using a single indexWriter for the 2 first threads,and my web application needs to write to the same index, which is impossible since the 2 threads are always indexing.
Roey
@Roey, if that is the case then perhaps LBushkin is giving you the correct advice. If you have a single thread that handles all requests for writing to the index, you could have any number of concurrent threads writing to the index.
sixlettervariables
+1  A: 

If you don't want to use LBushkin's idea of a work queue, the other approach is to use the same IndexWriter instance in the web application as the background threads are using. You haven't explained where the 2 indexing threads are - if they are in the same process/appdomain as the web application, it should be feasible to use the same instance. If not, then you have to use the equivalent of the work queue as mentioned by LBushkin, or an adapted version of it as follows: Add a third thread to the indexing process whose job is to listen to indexing requests from the web application. You can use e.g. Named Pipes for this (especially easy if you're using .NET 3.5). The web application sends indexing requests to the third thread, which uses the same IndexWriter as the other existing threads to update the index.

This is essentially the same idea as LBushkin's (the 3rd thread is a work queue consumer) but may involve less development work as you could be doing less additional coding.

Update: Named Pipes can be used between processes on different machines. You just need to be aware of firewall issues which may arise in certain network topologies.

Vinay Sajip
Can your named pipes idea be used for distributed indexing in Lucene?Or are they unable to communicate between different servers?
Roey
I've updated my answer to include what you asked in your comment.
Vinay Sajip