ansaurus

Question

Synonyms using Lucene

Answer 1

A:

I prefer to run a search using the whole phrase entered and weight anything returned heavier than the next series of searches. I then like to iterate through each word in the phrase and search with that with those results getting a lower score. I then aggregate the scores for all items returned more than once and sort the results accordingly. This may not be the 100% best way of doing this...but it has worked great for me in the past.

Andrew Siemer 2009-08-08 05:32:41

Answer 2

+3 A:

There is a contribution to the Lucene project called "wordnet". According to its documentation:

This package uses synonyms defined by WordNet to build a Lucene index storing them, which in turn can be used for query expansion. You normally run Syns2Index once to build the query index/"database", and then call SynExpand.expand(...) to expand a query.

It includes a sample of what it does:

If you pass in the query "big dog" then it prints out:

Query: big adult^0.9 bad^0.9 bighearted^0.9 boastful^0.9 boastfully^0.9 bounteous^0.9 bountiful^0.9 braggy^0.9 crowing^0.9 freehanded^0.9 giving^0.9 grown^0.9 grownup^0.9 handsome^0.9 large^0.9 liberal^0.9 magnanimous^0.9 momentous^0.9 openhanded^0.9 prominent^0.9 swelled^0.9 vainglorious^0.9 vauntingly^0.9 dog andiron^0.9 blackguard^0.9 bounder^0.9 cad^0.9 chase^0.9 click^0.9 detent^0.9 dogtooth^0.9 firedog^0.9 frank^0.9 frankfurter^0.9 frump^0.9 heel^0.9 hotdog^0.9 hound^0.9 pawl^0.9 tag^0.9 tail^0.9 track^0.9 trail^0.9 weenie^0.9 wiener^0.9 wienerwurst^0.9

You see that the original words ("big" and "dog") have no weighting attached to them. The synonyms, however, have a weighting (0.9) that you can configure yourself.

It comes bundled with the standard distribution of Lucene, in the "contrib" directory.

Adam Paynter 2009-08-08 17:55:33

Thanks for ur inputs Adam...Could you please refer to my question again?I've now edited it.

Ed 2009-08-09 15:59:22

The WordNet module builds a Lucene index, just like you are. This index that it builds is eventually used to expand queries. If you simply tried building this index from WordNet's dictionary, I am sure you could easily tell what field names it is using for its index and add your own, custom entries yourself.

Adam Paynter 2009-08-09 17:57:07

Answer 3

+1 A:

You can get the Query object after parsing the input query string with QueryParser.parse().

In most of the cases, the top-level query is boolean query with sub-queries as its children. You can recursively iterate on the query object. When you hit a TermQuery or PhraseQuery object, you can get the (sub)query, and replace that query object with a boolean query object consisting of its synoyms, if any.

Essentially, you are transforming your original query

a OR b AND c

to

(a OR synA) OR (b OR synB1 OR synB2) AND c

Operating at query object ensure that you simply replace the leaf nodes of the query with new queries and don't fiddle with arbitrarily complex query hierarchy.

Shashikant Kore 2009-08-11 15:21:45

ansaurus

tags:

views:

answers:

Synonyms using Lucene

related questions