ansaurus

Question

Tags and attributes in Lucene shared across documents

Answer 1

+2 A:

If I understand Lucene's indexing scheme correctly, when the same long string is indexed as a field in many documents, this doesn't really bulk out the index compared to if it were indexed just once. Correct?

If I create a single Term object, make it stored, and then add it to many documents, does the full string data get duplicated for each document in the index? If this is the case, am I just best off putting the actual storage of the tags/attributes into sql?

As far as I can tell, the only info that comes back in query results is the documents themselves ordered by score. To determine which fields satisfied the query for a matched document, must I do separate queries on the fields for each document, or what?

Correct. Lucene stores a dictionary mapping strings to numerical identifiers, so the memory consumed is only to store the identifier several times.
I think you are safe storing the tags and attributes in Lucene.
You do not need separate queries - once you hold a Document object, you can use e.g. getField() to get the relevant field information. Since you are concerned about Lucene performance, I suggest you read Scaling Lucene and Solr, which covers lots of performance tips.

Yuval F 2009-04-13 07:04:14

Thanks, yuval3) I understand I can inspect the Document fields, but I want to get the set of fields in the Document that matched the query. My impression is that you're supposed to use filters and scoring in the query rather than sorting that stuff out after the fact.

Jegschemesch 2009-04-13 07:32:08

This is a subtle issue. Lucene does not give this information as a default. I suggest you read: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-Search and go on to explore Lucene explanations and highlighting. HTH

Yuval F 2009-04-13 08:14:27

ansaurus

tags:

views:

answers:

Tags and attributes in Lucene shared across documents

MORE BACKGROUND

related questions