views:

1921

answers:

3

When using Lucene.Net with ASP.NET, I can imagine that one web request can trigger an update to the index while another web request is performing a search. Does Lucene.Net have built in it the ability to manage concurrent access, or do I have to manage it, to avoid "being used by another process" errors?

EDIT: After reading docs and experimentation, this is what I think I've learned: There are two issues, thread safety and concurrency. Multithreading is "safe" in that you can't do anything bad to the index. But, it's safe at the cost of just one object having a lock on the index at one time. The second object will come along and throw an exception. So, you can't leave a search open and expect a writer in another thread to be able to update the index. And if a thread is busy updating the index, then trying to create a searcher will fail.

Also, Searchers see the index as it was at the time that they open it, so if you keep them around, and update the index, they won't see the updates.

I wanted my searchers to see the latest updates.

My design, and it seems to be working so far, is that my writers and searchers share a lock, so that they don't fail - they just wait - until the current write or search is done.

+2  A: 

You don't have a problem with that so much as managing concurrent writes to the index. I've had an easier path going with SOLR, which abstracts most of those differences away for me since it runs as a server.

MattMcKnight
+10  A: 

According to this page,

Indexing and searching are not only thread safe, but process safe. What this means is that:

  • Multiple index searchers can read the lucene index files at the same time.
  • An index writer or reader can edit the lucene index files while searches are ongoing
  • Multiple index writers or readers can try to edit the lucene index files at the same time (it's important for the index writer/reader to be closed so it will release the file lock). However, the query parser is not thread safe, so each thread using the index should have its own query parser.

The index writer however, is thread safe, so you can update the index while people are searching it. However, you then have to make sure that the threads with open index searchers close them and open new ones, to get the newly updated data.

Judah Himango
What is overhead of opening index at each query?
Jenea
We have a fairly large (multi-gigabyte) search index, and the cost of opening an index at each query has been negligible.
Judah Himango
It should be noted that this article is about the original Lucene for Java. There is no mention of the .NET implementation, nor whether the behavior described is a feature of the Lucene "standard" (and thus would be reimplemented in Lucene.Net) or if it's implementation-specific behavior.
gWiz
Of course. However, the Lucene.NET implementation almost perfectly matches the Java Lucene implementation. I suspect they're actually running a Java-to-C# converter on it, patching it up, and releasing it as Lucene.NET.
Judah Himango
+3  A: 

You may have issues, if your indexing thread is creating a new document which results in merging of some index segments, then the merged segments will be deleted and new segment will be created. The problem is that your index searcher loaded up all the segments when it was opened, such that is has "pointers" to those segments which existed when it was opened. Now if the index writer does a segment merge and deletes a segment, your index searcher will still think that segment file exists and will fail with a "file not found error". What you really need to do is seperate your writable index from your searchable index, by using SOLR or doing your own index snapshot replication similar to what SOLR does. I have build very similar system to SOLR using .NET and Lucene.NET on Windows, using NTFS hard-links to make efficient snapshot replication. I can give you more info if you are interested.