ansaurus

Question

Efficiently sorting and paging with Solr when index is changing

Answer 1

+1 A:

This is not really a Solr-specific problem, but a general problem with pagination of any external data source, because the data source has an independent state from the (web) application. For example, it also happens on relational databases. Here's a good coverage of pagination in relational databases, along with the possible solutions. Most web applications / websites take the first solution: "Repeat the query for each new request" since the other solutions are much more complex and not scalable, but this suffers from the problem you describe. Browse the questions on stackoverflow.com for a while and you'll notice it, since questions are being created constantly.

In your case I'd consider modeling the Solr documents as your whole legal documents instead of their individual sections. You'll get a lot less documents (therefore a slower rate of inserts/deletes) and you can use the highlighting parameters to get snippets of the sections that matched the user query.

Another option would be decreasing your commit rate, but this could end up in less-than-ideal document freshness.

Mauricio Scheffer 2010-07-06 17:13:54

+1 indexing a whole document is probably the easiest way to go if it feasible in your case

Pascal Dimassimo 2010-07-06 17:19:09

Thanks Mauricio, some good thoughts. The problem with indexing the entire document is that I want to be able to present smaller subsets of the documents to the user in the UI, because some of these documents are thousands of pages long. I wanted to store divs per paragraph, and be able to present them to the user "in piecemeal", but like you say, it's a general problem when paginating.

Dan Fitch 2010-07-06 17:48:07

@Dan Fitch: what about highlighting?

Mauricio Scheffer 2010-07-06 19:30:20

When I say in piecemeal, I mean that I want to use Solr to store actual markup for the sections, not just the plaintext content, and then there is more than just searching this corpus: there needs to be a way to browse the entire "document" by assembling a "subset" of the sections into a viewable chunk.Sorry this isn't particularly clear, it's not super clear in my head yet either. :)

Dan Fitch 2010-07-06 19:56:57

ansaurus

tags:

views:

answers:

Efficiently sorting and paging with Solr when index is changing

Update:

related questions