views:

28

answers:

2

I've been looking like mad for an answer to this however I'm still in the dark:

i am using

int[] getTermPositions(int index)

of a TermPositionVector I have for a field (which has been set to store both offsets and positions) to get the term positions of the terms I'm interested in highlighting as keyword in context.

The question: What do these positions correspond to? Obviously not the

String[] getTerms()

that is returned by the TermFreqVector interface, as that contains just raw counts of my terms.

What I'm looking for is a way to get the "tokenized" array of my field so I can then pull out the surrounding terms around the index values returned by getTermPositions(int index)

Help? Thanks a bunch.

A: 
int[] getTermPositions(int index)

returns an array of the term positions of term i. You can get the index i using the

int indexOf(String term)

method of TermFreqVector. The term positions are the positions (with term as the unit) at which the given term occurs. For example,

// source text:
// term position 0   1     2     3   4     5    6   7    8
//               the quick brown fox jumps over the lazy dog

// terms:
// term index 0     1   2   3    4    5    6     7
//            brown dog fox jump lazy over quick the

// Suppose we want to find the positions where "the" occurs

int index = termPositionVector.indexOf("the"); // 7
int positions = termPositionVector.getTermPositions(index); // {0, 6}
Kai Chan
I got that far, but now what if I want to get the words at position 5 and 7 in the source so I can output "over the lazy" showing 'the' in context?
ebabchick
A: 

Well, this will accomplish what I wanted:

http://lucene.apache.org/java/3_0_2/lucene-contrib/index.html#highlighter

ebabchick