ansaurus

Question

access list of keywords from lucene index

Answer 1

A:

If you are trying to do a tag completion you don't need all the unique tags, you need the tags that match what the user has already entered. This can be done with a wildcard, fuzzy, span, or proefix query depending on the need.

Gandalf 2009-06-17 16:22:49

Answer 2

+5 A:

Use IndexReader.terms to get all the term values (and doc counts) for your tag field.

Coady 2009-06-17 18:15:29

Answer 3

+1 A:

Tag completion needs to come from either (a) a prefix query on your list of tags (like pytho*) , or (b) via a query on a ngram-tokenized field (for example, Lucene will index python as p, py, pyt, pytho, python in a separate field.) Both of these solutions allow you to do tag-completion queries on the fly.

What you're suggesting (and what Coady's response will get you) is a more offline approach or something that you don't really want to run at query time. This is also fine-- tag dictionaries are not expected to be in realtime-- but be aware that iterating through IndexReader's terms is not meant to be a "query-time" operation.

bwhitman 2009-06-17 18:20:21

I will look into the IndexReader.terms.However, I don't think your assumptions are correct. If lucene can expand terms in query-time, then it at least internally is fast enough to yield a list of terms for a given partial term. This functionality I'm interested in, to prevent to have a second index of unique tags.

deets 2009-06-18 06:57:37

Answer 4

+1 A:

Be careful about using terms from the index directly. If you have stemming enabled while indexing, all funny strings will start appearing in the term list. "Beauty" gets stemmed to "beauti", "create" is transformed to "creat" and so on.

Shashikant Kore 2009-06-18 05:56:28

Answer 5

+1 A:

You need to do two things:

1) When you create your document to index, make sure you use "ANALYZED"

doc.add(new Field("tags", tags, Field.Store.NO, Field.Index.ANALYZED));

2) Use a boolean query and OR all the terms:

BooleanQuery query = new BooleanQuery();

for( String tag : tags){
    query.add(new TermQuery("tags", tag), BooleanClause.Occur.SHOULD); 
}
TopDocs docs = searcher.search(query, null, searchLimit);

Cambium 2009-06-27 00:49:50

ansaurus

tags:

views:

answers:

access list of keywords from lucene index

related questions