views:

52

answers:

1

I'm trying to add the Lucene.NET Highlighter to my search, however its doing some really strange highlighting, what am I doing wrong?

Heres the highlighting code:

// stuff here to get scoreDocs

var content = doc.GetField("content").StringValue();
// content = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been"


var highlighter = new Highlighter(new StrongFormatter(), new HtmlEncoder(), new QueryScorer(query.Rewrite(indexSearcher.GetIndexReader())));
highlighter.SetTextFragmenter(new SimpleFragmenter(100));
var tokenStream = analyzer.TokenStream("content", new StringReader(content));

var bestFragment = highlighter.GetBestFragment(tokenStream, content);

Searching for "lorem" gives me this bestFragment value:

<strong>Lorem</strong> <strong>Ipsum</strong> is <strong>simply</strong> <strong>dummy</strong> <strong>text</strong> of the <strong>printing</strong> and <strong>typesetting</strong> <strong>industry</strong>. <strong>Lorem</strong> <strong>Ipsum</strong> <strong>has</strong> <strong>been</strong>

As you can see, its highlighted much more than just "Lorem". Why?

How do I make this behave sensibly?

I'm using a StandardAnalyzer and my query looks like "content:lorem"

*Edit: * Im using Lucene.NET 2.9.2

Thanks

A: 

I also having a similar problem, but using BrazilianAnalyzer. Please help.

Guilherme J Santos