ansaurus

Question

How to get the next term out of a Lucene index?

Answer 1

+1 A:

Since indexing the n-grams isn't an option in your situation, some brute force will be required. You could enumerate the IndexReader's terms and termPositions, but that would likely be excrutiatingly slow.

A faster approach would be implement a divide-and-conquer search algorithm by enumerating the terms and using a MultiPhraseQuery to check a group at once. Split all the potential terms into reasonably sized groups (say 1000), and run a MultiPhraseQuery search with each chunk and your prefix word. If there are any hits, recursively call on sub-groups until you reach a single term.

Coady 2009-08-04 02:22:37

Thanks for the ideas! This is for generating a report, so performance isn't really an issue. I ended up doing a brute-force search, creating PhraseQuerys composed of the term of interest and every other term in the index. Those queries which had hits indicated the terms which followed the term of interest.

Matthew Simoneau 2009-08-04 19:34:41

Answer 2

+1 A:

Here's Grant Ingersoll's paper: Accessing words around a positional match in Lucene.

Yuval F 2009-08-06 10:41:09

ansaurus

tags:

views:

answers:

How to get the next term out of a Lucene index?

related questions