tags:

views:

28

answers:

2

Hola guys!

I could not find any info on the web and stackoverflow on how to get the first matching character subsequence from a Lucene Document.

ATM i'm using this logic to retrieve results from Lucene:

        Document doc=searcher.doc(hit.doc);
        String text=doc.get("text");
        if (text.length() > 80){
            text=text.substring(0,80);
        }
        results.add(new SearchResult(doc.get("url"), doc.get("title"), text));

As you can see this just takes the first 80 chars of the searched text and wraps it together with some other data into a SearchResult object.

Is it somehow possible to retrieve the first or even highest scoring subsequence of the text which actually contains any searchterms?

+2  A: 

You need Lucene Highlighter. Here and here you can find some more info on it.

Andrei
Also note that there are several Highlighter implementations for both Lucene 2.x and Lucene 3.0. Take the one that fits your task better.
Andrei
+1  A: 

It is called hit highlighter. This is probably a duplicate of another highlighter question

Eugene Kuleshov