ansaurus

Question

Finding the start and end of a match with Lucene

Answer 1

A:

If I were you I'd just take code from FastVectorHighlighter. Relevant code is in FieldTermStack:

        List<string> termSet = fieldQuery.getTermSet(fieldName);
        VectorHighlightMapper tfv = new VectorHighlightMapper(termSet);    
        reader.GetTermFreqVector(docId, fieldName, tfv);  // <-- look at this line

        string[] terms = tfv.GetTerms();
        foreach (String term in terms)
        {
            if (!termSet.Contains(term)) continue;
            int index = tfv.IndexOf(term);
            TermVectorOffsetInfo[] tvois = tfv.GetOffsets(index);
            if (tvois == null) return; // just return to make null snippets
            int[] poss = tfv.GetTermPositions(index);
            if (poss == null) return; // just return to make null snippets
            for (int i = 0; i < tvois.Length; i++)
                termList.AddLast(new TermInfo(term, tvois[i].GetStartOffset(), tvois[i].GetEndOffset(), poss[i]));

The major thing there is reader.GetTermFreqVector(). Like I said, FastVectorHighlighter already does some legwork that I would just copy, but if you want, that GetTermPositions call should do everything you need.

Xodarap 2010-07-26 16:45:19

I should have specified that I'm using Lucene Java 3.0.2. Still, I will look at the code for FastVectorHighlighter is see if I can get what I need from there.

Mike T 2010-07-27 06:25:08

@Mike: Sorry, I figured c# syntax was close enough to java. In any case, the TermPositionsVector should do what you want. Since you want to highlight phrases it will be a bit tougher (you'll need to find ones which are right next to each other) but it shouldn't be too bad.

Xodarap 2010-07-27 14:23:21

ansaurus

tags:

views:

answers:

Finding the start and end of a match with Lucene

related questions