views:

209

answers:

3

Hi,

we are using lucene within a web application to search in a great number of PDF documents.

The workflow is like this:

  1. A user enters a search term

  2. A list of search results is presented to the user.

  3. Each search result represents one PDF document and shows the user on which page the search term was found. Each of these pages is represented as a hyperlink.

  4. If the user now clicks on such a hyperlink, he directly jumps to that page.

  5. But now the user has the problem that the search term isn't highlighted on the page. Therefore the user has to look on his own to find the search term on the page.

What we wanted is a way to highlight the search term on the specific page in the PDF.

The open parameters for Acrobat Reader allow for either searching a PDF document (with hit highlighting) OR jumping to a specific page. But the combination of both parameters - which we would need - doesn't work.

Does anyone have an idea how jumping to a page and highlighting a search term in a pdf document could work? I had a look at the Acrobat SDK but don't see how we can use it (it's terribly documented).

Cheers, Helmut

A: 

Sorry might not be an answer, but a workaround could be to covert the PDF to html and use Lucene highlighter (similar to what Google does)

Mikos
A: 

You'd have to write a snippet of Javascript to get the behavior you are looking for.

Dwight Kelly
+1  A: 

acrobat uses a plugin to hilite terms, and requires a fdf stream to indicate the words to hilite. See here for pointers:

support.dtsearch.com/dts0152.htm

update:

assuming you know the page# and word# on the page to hilight, here is one way to do it:

On web page:

<iframe id="acroframe" src="pdfpage/example.pdf#xml=http://example.com/hilite.aspx?hilite=8e3302ee-ff88-41ee-bdfb-9e8df87cc3ad&amp;toolbar=1&amp;navpanes=0&amp;statusbar=0&amp;view=FitH"&gt;
</iframe>

The PDF will appear in the frame, it will show the toolbar, hide the navpane & status bars and fit page to horizontal. Then it will query the web site to get the xfdf data for hilighting: http://example.com/hilite.aspx?hilite=8e3302ee-ff88-41ee-bdfb-9e8df87cc3ad

Here I used a guid key that I previously saved in the session with the hilite xfdf value. The hilite.aspx page will return something like the following to hilite words in the document:

<XML>
<Body units=characters color=#ff00ff mode=active version=2>
<Highlight>
<loc pg=15 pos=3583 len=5>
</Highlight>
</Body>
</XML>

This will hilight 5 chars on page 15 starting at position 3583. (note: xfdf is not real "XML" despite the similarity)

Note that acrobat reader will have to have the "Enable search highlights from external highlight server" option checked in preferences.

mosheb