views:

86

answers:

1

Hi All

I have Lucene files indexed according to pageIds (UniqueKey). and one document can have multiple pages. Now once user perform some search it gives us pages that matches search criteria.

I am using Lucene.Net 2.9.2

We have 2 problems...

1- The file size is around 800GB and it has 130 million rows (pages) so the search time was really slow (all queries taking more than a min (we only have to return limited rows at a time)

To overcome the performance issue I shifted to SOLR which resolved the performance issue (which is quite strange as I am not using any extra functionality provided by SOLR like sharding etc - so could it be that Lucene.NET 2.9.2 is not really equivalent to performance compared to same version of JAVA??) but now I am having another issue...

2- The individual 'lucene document' is one page but i want to show results 'grouped by' 'real documents'. How many results I should be returned should be configurable based on 'real documents' not 'pages' (coz thats how I want to show to the user).

So lets say I want 20 'real documents' and ALL pages in them that matches the search criteria (doesnt matter if one document has 100 pages and another just 1).

From what I could get from SOLR forums was that it can be achieved by SOLR-236 patch (field collapsing) but I have not been able to apply the patch correctly with trunk (gives lots of errors).

This is really imp for me and I dont have much time, so can someone please either send me the SOLR 1.4.1 binary with this patch applied or guide me if there is any other way.

I would really appreciate it. Thanks!!

A: 

If you have issues with the collapse patch, then the Solr issue tracker is the channel to report them. I can see that other people are currently having some issues with it, so I suggest getting involved in its development.

That said: I recommend that if your application needs to search for 'real documents', then build your index around these 'real documents', not their individual pages.

Mauricio Scheffer
@ Mauricio Scheffer: actually it is our requirement that we need to show page numbers where that query matches.
Ahsan Iqbal