views:

178

answers:

1

I know there are several topics on the web, as well as on SO, regarding indexing and query performance within Lucene, but I have yet to find one that discusses whether or not (and if so, how much?) creating payloads will affect query performance...

Here's the scenario ...

Let's say I want to index a collection of documents (anywhere from 100K - 10M), and each document has a subsection that I want to be able to search separately (or perhaps rank higher, depending on whether a match was found within that section).

I'm considering adding a payload (during indexing) to any term that appears within that subsection, so I can efficiently make that determination at query-time.

Does anyone know of any performance issues related to using payloads, or even better, could you point me to any online documentation about this topic?

Thanks!

EDIT: I appreciate the alternative solutions to my scenario, but in case I do need to use payloads in the future, does anyone have any comments regarding the original question about query performance?

A: 

The textbook solution to what you want to do is index each original document as two fields: one for the full document, and the other for the subsection. You can boost the subsection field separately either during indexing or during retrieval. Having said that, you can read about Lucene payloads here: Getting Started with Payloads.

Yuval F
Thanks for the tip. That's what I'm currently doing, I just thought there might be a better way. Do you know of any references you could point me to that would support your claim?
ph0enix
You can try:http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Optimizing-Findability-Lucene-and-Solrand http://www.manning.com/hatcher3/
Yuval F