ansaurus

Question

How can I order the list in LuceneSearch according to number of hits.

Answer 1

+1 A:

Lucene should do this automatically, but it depends in some part on how you formulate your query. By default if you do a query with more than one word then those are ORd together. For example, say your query was something like this (searching the contents field):

contents:apples oranges

This would return any pages with the term apples OR oranges in it. If a page contains the word "apples" 50 times but no reference to orange that page would still rank higher than a page that just contained the word "apples" once and "oranges" once.

What you probably want to do is AND your query like this:

contents:apples AND oranges

Note: uppercase AND

This will only return pages that have both the word "apples" AND "oranges" in it, which is probably nearer to what you want.

Have a read of Lucene - Query Parser Syntax for more info on how to forumulate queries

Dan Diplo 2009-08-04 08:38:52

Answer 2

A:

I agree with Dan that this should be Lucene's default behavior. If your implementation does not behave this way, please add details so we can help you diagnose why. Lucene's Similarity class documentation explains the details of Lucene scoring, which is responsible for the order of the hits.

Yuval F 2009-08-04 09:40:18

I am using AND in the query please see above I have included the code that I am using

Pranali Desai 2009-08-04 11:35:24

Answer 3

A:

On first sight, your code looks like it should function as expected.
Could you show us an example of a finalText, type and the results?
When I get unexpected results, I usually check what query was actually used (in debug mode check the value of q) and use that query in Luke to see what results it gives.

In my code, I usually use hits.Max instead of hits.Length. Don't know what the difference is, but it's something I noted.

Also, as a side note, unless the rest of your program dictates you otherwise, you might want to check out the HashTable instead of a ArrayList for your IdList, it's usually faster.

borisCallens 2009-08-04 11:53:40

Answer 4

A:

I have googled around and found that Lucene lists the search result in the order of score of the hits,which is not the phenomenon of number of occurence of the phrase but is calculated depending on various factors, and therefore I think it will not be possible to get it from Lucene straight, but if you find some way please let me know.

Pranali Desai 2009-08-05 08:20:40

Answer 5

+2 A:

Lucene ranks documents by score. There are several components to the score for a document for a given query. One of them is the frequency of the term in the field queried. However, for a search on a single term, the calculation is pretty simple. It's proportional to the square root of the number of occurrences of the term in the field normalized by field length. This could be where you are running into trouble.

If you search for the word "stack" and doc A has 1 occurrences, and doc B has 2 occurrences, doc A could still rank higher in the results if the field length is significantly greater than that of doc B.

The good news is you can disable field normalization. The bad news is that you need to do it before you index, unless you over the Similarity class to always factor it out, but I wouldn't recommend doing it this way. To disable norms at index time, in your indexing code, call Field.setOmitNorms(true) on the Field object you add to the IndexWriter. In your case this would be for the "text" field.

KenE 2009-08-06 14:19:20

Hi KenE this sounds great but where do I implement Field.setOmitNorms(true)??

Pranali Desai 2009-08-07 05:59:09

You would call it in your indexing code.

KenE 2009-08-07 13:13:05

ansaurus

tags:

views:

answers:

How can I order the list in LuceneSearch according to number of hits.

related questions