views:

79

answers:

2

I have coded up an ASP.NET website and running on win'08 (remotely hosted). The application queries 11 very large Lucene indexes (each ~100GB). I open IndexSearchers on Page_load() and keep them open for the duration of the user session.

My questions:

  1. The queries take a ~5 seconds to complete - understandable these are very large indexes - but users want faster responses. I was curious to squeeze out better performance. ( I did look over the Apache Lucene website and try some of the ideas over there). Interested in if & how you tweaked it further, especially ones from asp.net perspective.

  2. One ideas was to use Solr instead of querying Lucene directly. But that seems counter-intuitive, introducing another abstraction in between and might add to the latency. Is it worth the headache in porting to Solr? Can anyone share some metrics on what improvement you got following a switch to Solr if it has been worth it.

  3. Are there some key things that could be done in Solr that could be replicated to speed up response times?

+1  A: 

Some questions / ideas:

  • Are you hitting all 11 indexes for a single request?
  • Can you reorganize the indexes so that you hit only 1 index (i.e. sharding) ?
  • Have you run a profile of the application (using dotTrace or similar tool)? Where is the time spent? Lucene.Net?
  • If most of the time is spent on Lucene.Net, then if you migrate to Solr the latency should be negligible (compared to the rest of the spent time). Plus, Solr can be easily distributed to increase performance.
  • I'm not all too familiar with Lucene (I use Solr) but if you're searching 11 indexes per request, can you run those searches in parallel (e.g. with TPL) ?
Mauricio Scheffer
The queries to the indexes are context dependant. I hold the 11 searchers open for the duration of the user session. I am guessing this is inefficient, but this was a rush-job. I want to re-engineer; If I understand correctly you would recommend migrating to Solr (presumably multicore). Thanks for responding!
Mikos
Can you somehow reduce the number of indexes? Do all of the searches have to be performed sequentially? With TPL you can create future tasks with continuations and parallelize that. Or you could try pooling your IndexSearchers to take advantage of the warm up.
Mauricio Scheffer
Cannot reduce/collapse them - each index serves a different purpose. For example: one index is a list of Publications, one in a set of experts, one is a set of affiliations/institutes and so on....each needs to be query-able separately, especially w.r.t context.
Mikos
Mikos
If you migrate to Solr, you would represent each of your 11 indexes as a core. In addition to that, you can *shard* out (distribute) some or all of your indexes to other boxes. I think you will get better performance and also benefit from the additional flexibility, however I'd recommend running some tests to see if it's worth the migration for *your* particular task, i.e. if you get 4s instead of 5s you might decide it's not worth the migration.
Mauricio Scheffer
A: 

The biggest thing is removing the search from the web tier, and isolating it to it's own tier (a search tier). That way, you have a dedicated box with dedicated resources that have the indexes loaded, and "warmed up" in cache, instead of having each user have a copy of it's own index reader.

GalacticJello
Could you clarify on what "isolating to a search tier" means? Should I use a separate physical machine? Or something else?
Mikos
@GalaticJello's suggestion is exactly what Solr is :)
Mauricio Scheffer
"Are there some key things that could be done in Solr that could be replicated to speed up response times?"... and those key things are listed above, if you would like to replicate them.
GalacticJello