ansaurus

Question

Lucene HTMLFormatter skipping last character

Answer 1

+1 A:

Lucene's highlighter, out of the box, is geared to handle plain text. It will work incorrectly if you try to highlight HTML or any mark-up text.

I recently ran into the same problem and found a solution in Solr's HTMLStripReader which skips the content in tags. The solution is outlined on my blog at following URL.

http://sigabrt.blogspot.com/2010/04/highlighting-query-in-entire-html.html

I could have posted the code here, but my solution is applicable for Lucene Java. For .Net, you have to find out equivalent of HTMLStripReader.

Shashikant Kore 2010-05-06 05:33:18

Thanks for the link. +1 for that. My problem was something else though. Adding an answer for that

Midhat 2010-05-06 05:39:38

Answer 2

A:

Solved. Apparently my Highlighter.Net version was archaic. Upgrading to 2.3.2.1 Solved the problem

Midhat 2010-05-06 05:43:20

ansaurus

tags:

views:

answers:

Lucene HTMLFormatter skipping last character

related questions