Lucene.NET highlighter plugin highlighting strangely | ansaurus

tags:

views:

52

answers:

1

Q:

Lucene.NET highlighter plugin highlighting strangely

I'm trying to add the Lucene.NET Highlighter to my search, however its doing some really strange highlighting, what am I doing wrong?

Heres the highlighting code:

// stuff here to get scoreDocs

var content = doc.GetField("content").StringValue();
// content = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been"


var highlighter = new Highlighter(new StrongFormatter(), new HtmlEncoder(), new QueryScorer(query.Rewrite(indexSearcher.GetIndexReader())));
highlighter.SetTextFragmenter(new SimpleFragmenter(100));
var tokenStream = analyzer.TokenStream("content", new StringReader(content));

var bestFragment = highlighter.GetBestFragment(tokenStream, content);

Searching for "lorem" gives me this bestFragment value:

<strong>Lorem</strong> <strong>Ipsum</strong> is <strong>simply</strong> <strong>dummy</strong> <strong>text</strong> of the <strong>printing</strong> and <strong>typesetting</strong> <strong>industry</strong>. <strong>Lorem</strong> <strong>Ipsum</strong> <strong>has</strong> <strong>been</strong>

As you can see, its highlighted much more than just "Lorem". Why?

How do I make this behave sensibly?

I'm using a StandardAnalyzer and my query looks like "content:lorem"

*Edit: * Im using Lucene.NET 2.9.2

Thanks

A:

I also having a similar problem, but using BrazilianAnalyzer. Please help.

Guilherme J Santos 2010-10-27 17:46:21

related questions

Lucene.Net and Geosearch - is it outthere somewhere?

How to use a Stemmer in Lucene.net?

Search results Highlighting using Lucene.net

Paging using Lucene.net

How do I load balance Lucene.Net ?

using date range in Lucene.net

How to index and find numbers with Lucene.NET?

What are the main differences between search engines that should influence the decision as to which to use to search proprietary data?

Is Lucene.Net suitable as the search engine for frequently changing content?

Lucene.NET --> access denied to segments

Indexing Multiple Tables in Lucene

How to make the Lucene QueryParser more forgiving?

Lucene.net with IndexSearcher/IndexWriter in a Web Application

SetSystemFileCacheSize and RtlCompressBuffer

How do you implement search functionality using location information in ASP.NET?

Delete all indices in Lucene.net

Lucene.Net fails at my host because it calls GetTempPath(). What's the work around?

Does Lucene.Net manage multiple threads accessing the same index, one indexing while the other is searching?

How to have synonyms in Lucene.Net

Lucene.Net Search result to highlight search keywords

Has anyone used lucene.net with Linq-to-Entities?

Can someone give me a high overview of how lucene.net works?

How to sort by Lucene.Net field and ignore common stop words such as 'a' and 'the'?

Best full text search alternative to ms sql, c++ solution

Lucene.Net and SQL Server