ansaurus

Question

Lucene TermPositionVector and retrieving terms at index locations

Answer 1

A:

int[] getTermPositions(int index)

returns an array of the term positions of term i. You can get the index i using the

int indexOf(String term)

method of TermFreqVector. The term positions are the positions (with term as the unit) at which the given term occurs. For example,

// source text:
// term position 0   1     2     3   4     5    6   7    8
//               the quick brown fox jumps over the lazy dog

// terms:
// term index 0     1   2   3    4    5    6     7
//            brown dog fox jump lazy over quick the

// Suppose we want to find the positions where "the" occurs

int index = termPositionVector.indexOf("the"); // 7
int positions = termPositionVector.getTermPositions(index); // {0, 6}

Kai Chan 2010-08-20 07:50:25

I got that far, but now what if I want to get the words at position 5 and 7 in the source so I can output "over the lazy" showing 'the' in context?

ebabchick 2010-08-22 07:18:28

Answer 2

A:

Well, this will accomplish what I wanted:

http://lucene.apache.org/java/3_0_2/lucene-contrib/index.html#highlighter

ebabchick 2010-08-22 07:19:18

ansaurus

tags:

views:

answers:

Lucene TermPositionVector and retrieving terms at index locations

related questions