tags:

views:

34

answers:

1

Hi All,

The are some articles are written in several parts, for example, I got those articles from IBM developer works:

Distributed data processing with Hadoop, Part 1:Getting started

Distributed data processing with Hadoop, Part 2:Going further

Distributed data processing with Hadoop, Part 3: Application development

I will index those three articles separately. And some one search certain keywords, it is possible the part3 is on the top of hit whle part1 is on the 32th. Therefor, if I list results page by page, the part1 and part3 will display on different page.

How can I make sure the hitted documents in the same series displayed together?

I guess in SQL, we can use "group by".

+2  A: 

I believe what you are asking for is Field Collapsing, which is currently a trunk feature in Solr, and will be incorporated into the next Solr version.

If you want to roll your own, One possible way to do this is:

  1. Add a "series id" field to each document that is a member of a series. You will have to ensure that this gets incremented for every new series.
  2. Make an initial query to Lucene, and get a hit list.
  3. For each hit, check to see if it has a series id; If it does, make another query by the series id in order to retrieve all the members of the series.

An alternative is to store the ids of all the series members in a field inside each member's document.

Yuval F
Thank you, that's very helpful.
Ke